On Thursday, October 31st, Hanover Insurance and OPIM Innovate co-sponsored a Predictive Analytics workshop. Run by alumni and Travelers Data Integrity Specialist Aleya Hafez (Mathematics & Statistics‘18), students were introduced to different data modeling approaches.
Hafez’s lecture followed the usual data modeling framework–cleaning data with hypotheses in mind, splitting data, and finally creating predictive models. This, of course, came with the need to explain what each segment entailed, information she generously opened with. In explaining the data cleaning process, she informed students of potential outliers, duplicate records, data input errors, or any other abnormal or extraneous data that may skew test results. With regard to splitting the data, she taught students the difference between a training dataset and a test (or validation) dataset. In brief, training datasets and test datasets are samples taken from the whole; training datasets are used to build initial models, whereas test datasets are there to verify results. As for creating models, Hafez specified the process to further increase understanding.
To provide students with a conceptual guide of data modeling, Hafez broke down the process into the following steps:
From these steps, she then explained separate modeling types, such as linear regression, logistic regression, and decision trees. Whereas linear regression predicts specific numeric values, logistic regression and decision trees predict the probability of something being true or not. For example, a model using logistic regression will be able to predict whether an email is spam or not. A decision tree, on the other hand, brings these probabilities into finer and finer levels depending on the number of nodes, or categories, added (see below):
Decision trees may also be used to assess model outcomes against actual data. This helps data scientists select the best models to use.
To end the lecture, Hafez used the concepts she described to create predictive models in R. This gave students the chance to see data modeling in action, and provided them with enough of a preview to ask clarifying questions.
When Hafez was asked why she volunteered her time to share her knowledge on predictive analytics with UConn students, she said the following: “I’ve always enjoyed helping other people, and it excites me to know that someone other than me is interested in statistics. It’s great to see that the UConn School of Business is putting more of an emphasis on predictive analytics, so I’m happy to give my take!”
We thank Hafez for her time and everyone who attended the Predictive Analytics workshop!