$35.00
1 Overfitting
Overfitting is a common problem when doing datascience work.
(a) How can you tell if a model you have trained is overfitting?
(b) Why do we want to avoid overfitting?
(c) Explain how L1 and L2 regularization methods can mitigate the problem. How do these two techniques
affect the model weights? When would you choose one over the other?
2 K-Nearest Neighbors for Classification
Consider the following two types of data points shown in the Figure. The blue one with coordinates (x, y):
(0, 2), (1,4), (3,6), and the red one with coordinates (x, y): (3, 2), (4, 4), (2, 0).
(a) Using k=1, classify points (1, 2), (2, 3), (10, 10). If you can’t classify some of the points, explain why
and propose a way to solve the problem.
(b) Repeat the same classification with k=3.
(c) If instead our dataset was comprised of blue points: (0, 200), (2, 500), (3,600) and red points: (3, 200),
(4, 400), (1, 0), briefly explain what problem we might have with this kind of sampling and how can
we address this problem?
(d) Suppose you have a dataset consisting of 1000 samples. Describe a method to select the optimal K to
use for KNN?
1
3 Logistic Regression Interpretation
Suppose we fit a multiple logistic regression: log P (Y =1)
1−P (Y =1)
= β0 + β1X1 + . . . + βpXp.
(a) Suppose we have p = 2, and β0 = 1, β1 = −1, β2 = 2. When X1 = X2 = 0, what are the odds and
probability of the event that Y = 1?
(b) How does one unit increase in X1 or X2 change the odds and probability of the event that Y = 1?
(c) Explain how increasing or decreasing β0, β1 or β2 affect our predictions.
4 Confusion Table
Suppose we have the following confusion table output by the logistic regression using the probability threshold
P(Y = 1) ≥ π.
Predicted Yˆ = 0 Predicted Yˆ = 1
Actual Y = 0 735 2
Actual Y = 1 50 45
(a) What are false positives, false negatives, true positives, and true negatives.
(b) Compute precision, recall and F1 score.
(c) How would you expect precision, recall and F1 score to change if the threshold was lower? Provide a
brief explanation.
(d) Can we compute AUC score with the given information? If not, what else would you need to know?
5 Support Vector Machine
(a) Suppose you have a dataset with 2 classes (’+’ and ’-’). If you remove one of the points that is not
circled will that alter your Decision boundary?
(b) What is meant by a hard margin or soft margin? In this case will it matter if your Decision Boundary
is either?
(c) How might your Decision Boundary differ when using Linear SVM compared to Radial Basis Function
(RBF) kernel SVM?
(d) Explain what the parameters gamma and C of the RBF kernel SVM do.
WhatsApp us