$40.00

Category: DSCC 465

Description

5/5 - (8 votes)

1) Suppose you’re on a game show, and you’re given the choice of three doors: Behind one

door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who

knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then

says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your

choice? (Note: Please show your solution step-by-step by using what you know about

marginal probability, conditional probability, joint probability, and the Bayes’ theorem)

2) Suppose we have two NBA teams – for simplicity team A and team B – who have made

it to NBA Playoffs. In each game between these two teams, team A has a winning

probability of 0.55, and team B has a winning probability of 0.45. What is the probability

that these two teams will play the 7

th game in NBA Playoffs? (Notes: There cannot be a

tie in any game (i). Please check this link for more information about NBA Playoffs:

https://en.wikipedia.org/wiki/NBA_playoffs and to think about possible combinations

(ii). Also, please show your solution step-by-step by using what you know about marginal

probability, conditional probability, joint probability, and the Bayes’ theorem (iii)).

3) From scratch (not using any pre-packaged tools for direct calculation), implement the

gradient descent algorithm for linear regression and test your results on the California

Housing Dataset:

Spring 2022: Int. to Statistical Machine Learning University of Rochester

2

https://scikitlearn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html#skl

earn.datasets.fetch_california_housing

Here is what you need to do step by step:

a. Implement the gradient descent algorithm from scratch

b. Choose the following features from the dataset as your X matrix: MedInc,

HouseAge, AveRooms, AveBedrms, Population, AveOccup, Latitude, Longitude

c. Choose the following feature from the dataset as your Y matrix: MedHouseVal

d. Randomly split your data into training (70% of total) and test sets (30% of total)

by using sklearn’s train_test_split function. Set random_state =

265:

https://scikitlearn.org/stable/modules/generated/sklearn.model_selection.train_test_split.ht

ml.

e. Set the number_of_steps = 1000 and learning_rate = 0.01.

f. By running your code, determine the best set of parameters (=weights) for the

constant and your features listed in b). Your cost function will be MSE (=you should

pick the set of parameters that give you the lowest MSE).

g. Report and interpret the results. What are the factors that explain the house

prices the most?

4) Now, try using a pre-packaged tool and comparing the results. Do the following:

a. Use SGDRegressor provided by scikit:

https://scikitlearn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html

b. Step b), c), and d) are the same as in Question 3.

c. Set the max_iter = 1000, alpha = 0.01, random_state = 265, and

loss = ‘squared_error’. Other parameters should be set to ‘default’.

d. By running your code, determine the best set of parameters (=weights) for the

constant and your features listed in b).

e. Report and interpret the results. What are the factors that explain the house

prices the most? Are the results different from the previous question? If

different, explain why the results might be different.

5) Finally, write a function from scratch that computes a variance-covariance matrix by

transforming the following formula into code:

Variance-covariance matrix: 𝑐𝑜𝑣(𝑿) = E[(𝐗 − E[𝐗])(𝑿 − E[𝑿])

𝑇

]

Your function/code should work for matrices of any size. Test that your function is running

(=successfully computing the variances and covariances of the variables and variable pairs

Spring 2022: Int. to Statistical Machine Learning University of Rochester

3

in the dataset) by using the California Housing Dataset that you have used in previous

questions.

WhatsApp us