Sale!

CS 584-04: Machine Learning Assignment 4

$30.00 $18.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (3 votes)

Question 1 (50 points)
In 2014, Allstate provided the data on Kaggle.com for the Allstate Purchase Prediction Challenge which is open. The data contain transaction history for customers that ended up purchasing a policy. For each Customer ID, you are given their quote history and the coverage options they purchased.
The data is available on the Blackboard as Purchase_Likelihood.csv. It contains 665,249 observations on 97,009 unique Customer ID. You will build a multinomial logistic model with the following specifications.
1. The nominal target variable is A which have these categories 0, 1, and 2
2. The nominal features are (categories are inside the parentheses):
a. group_size. How many people will be covered under the policy (1, 2, 3 or 4)?
b. homeowner. Whether the customer owns a home or not (0 = No, 1 = Yes)?
c. married_couple. Does the customer group contain a married couple (0 = No, 1 = Yes)?
3. Include the Intercept term in the model
4. Enter the five model effects in this order: group_size, homeowner, married_couple, group_size * homeowner, and homeowner*married_couple (No forward or backward selection)
5. The optimization method is Newton
6. The maximum number of iterations is 100
7. The tolerance level is 1e-8.
8. Use the sympy.Matrix().rref() method to identify the non-aliased parameters

Please answer the following questions based on your model.
a) (5 points) List the aliased parameters that you found in your model.

b) (5 points) How many degrees of freedom do you have in your model?

c) (10 points)After entering a model effect, calculate the Deviance test statistic, its degrees of freedom, and its significance value between the current model and the previous model. List your Deviance test results by the model effects in a table.

d) (5 points) Calculate the Feature Importance Index as the negative base-10 logarithm of the significance value. List your indices by the model effects.

e) (10 points) For each of the sixteen possible value combinations of the three features, calculate the predicted probabilities for A = 0, 1, 2 based on the multinomial logistic model. List your answers in a table with proper labelling.

f) (5 points) Based on your model, what values of group_size, homeowner, and married_couple will maximize the oddsvalue Prob(A=1)/Prob(A = 0)? What is that maximum odd value?

g) (5 points) Based on your model, what is the odds ratio for group_size = 3 versus group_size = 1, and A = 2 versus A = 0? Mathematically, the odds ratio is (Prob(A=2)/Prob(A=0) | group_size = 3) / ((Prob(A=2)/Prob(A=0) | group_size = 1).

h) (5 points) Based on your model, what is the odds ratio for homeowner = 1 versus homeowner = 0, and A = 0 versus A = 1? Mathematically, the odds ratio is (Prob(A=0)/Prob(A=1) | homeowner = 1) / ((Prob(A=0)/Prob(A=1) | homeowner = 0).

Question 2 (50 points)
You are asked to build a Naïve Bayes model using the same Purchase_Likelihood.csv. The model specifications are:
1. No smoothing is needed. Therefore, the Laplace/Lidstone alpha is zero
2. The nominal target variable is A which have these categories 0, 1, and 2
3. The nominal features are (categories are inside the parentheses):
a. group_size. How many people will be covered under the policy (1, 2, 3 or 4)?
b. homeowner. Whether the customer owns a home or not (0 = No, 1 = Yes)?
c. married_couple. Does the customer group contain a married couple (0 = No, 1 = Yes)?

Please answer the following questions based on your model.
a) (5 points) Show in a table the frequency counts and the Class Probabilities of the target variable.

b) (5 points) Show the crosstabulation table of the target variable by the feature group_size. The table contains the frequency counts.

c) (5 points) Show the crosstabulation table of the target variable by the feature homeowner. The table contains the frequency counts.

d) (5 points) Show the crosstabulation table of the target variable by the feature married_couple. The table contains the frequency counts.

e) (10 points) Calculate the Cramer’s V statistics for the above three crosstabulations tables. Based on these Cramer’s V statistics, which feature has the largest association with the target A?

f) (5 points) Based on the assumptions of the Naïve Bayes model, express the joint probability Prob(A = a, group_size = g, homeowner = h, married_couple = m) as a product of the appropriate probabilities.

g) (10 points) For each of the sixteen possible value combinations of the three features, calculate the predicted probabilities for A = 0, 1, 2 based on the Naïve Bayes model. List your answers in a table with proper labelling.

h) (5 points) Based on your model, what values of group_size, homeowner, and married_couple will maximize the odds value Prob(A=1) / Prob(A = 0)? What is that maximum odd value?