Description

5/5 - (5 votes)

Questions: Theory
1. Non-Uniform Weights in Linear Regression: (6 marks) You are given a dataset in
which the data points are denoted by (xn, tn), n = 1, · · · , N. Each data point is associated
with a non-negative weighting factor gn > 0. The error function is thus modified to:
ED(w) = 1
2
X
N
n=1
gn

tn − w
TΦ(xn)
2
where Φ(·) is any representation of the data.
1
(a) (3 marks) Find an expression for the solution w∗
that minimizes the above error
function.
(b) (3 marks) Give two alternative interpretations of the above weighted sum-of-squares
error function in terms of: (i) data-dependent noise variance and (ii) replicated data
points.
2. Bayes Optimal Classifier: (2 marks) Let there be 5 hypotheses h1 through h5 that
could guide a robot to move either Forward(F) or Left(L) or Right(R):
P(hi
|D) P(F|hi) P(L|hi) P(R|hi)
0.4 1 0 0
0.2 0 1 0
0.1 0 0 1
0.1 0 1 0
0.2 0 1 0
Compute the MAP estimate and Bayes optimal estimate using the data provided in the
table. Are they the same? Justify your answer.
3. VC-Dimension: (2 marks) Consider a data setup of one-dimensional data ∈ R
1
, where
the hypothesis space H is parametrized by {p, q} where x is classified as 1 iff p < x < q.
Find the VC-dimension of H.
4. Regularizer: (4 marks) Given D-dimensional data x = [x1, x2, · · · , xD], consider a linear
model of the form:
y(x, w) = w0 +
X
D
k=1
wkxk
Now, for N such data samples with their corresponding labels (xi
, ti), i = 1, 2, · · · , N, the
sum-of-squares error (or mean-squared-error) function is given by:
E(w) = 1
2
X
N
i=1

y(xi
, w) − ti
2
Now, suppose that Gaussian noise k ∼ N (0, σ2
) (i.e. zero mean and variance σ
2
) is added
independently to each of the input variables xk. Find a relation between: minimizing the
above sum-of-squares error averaged over the noisy data, and minimizing the standard sumof-squares error (averaged over noise-free input data) with a L2 weight-decay regularization
term, in which the bias parameter w0 is omitted from the regularizer.
Questions: Programming
5. Logistic Regression: (7 marks)
(a) (3 marks) Implement your own code for a logistic regression classifier, which is trained
using gradient descent and cross-entropy error as the error function.
2
Index x1 x2 y
1 0.346 0.780 0
2 0.303 0.439 0
3 0.358 0.729 0
4 0.602 0.863 1
5 0.790 0.753 1
6 0.611 0.965 1
Table 1: Train Set
Index x1 x2 y
1 0.959 0.382 0
2 0.750 0.306 0
3 0.395 0.760 0
4 0.823 0.764 1
5 0.761 0.874 1
6 0.844 0.435 1
Table 2: Test Set
(b) Consider the training set and test set given in Tables 1 and 2. We use the linear
model fθ(x1, x2) = θ0 + θ1×1 + θ2×2 and the logistic regression function σ(fθ(x1, x2)) =
1
1+exp−fθ
(x1,x2)
. Consider the initial weights as θ0 = −1, θ1 = 1.5, θ2 = 0.5, and learning
rate as 0.1 (for gradient descent).
i. (1 mark) What is the logistic model P(ˆy = 1|x1, x2) and its cross-entropy error
function?
ii. (1 mark) Use gradient descent to update θ0, θ1, θ2 for one iteration. Write down
the updated logistic regression model.
iii. (2 mark) At convergence of gradient descent, use the model to make predictions
for all the samples in the test dataset. Calculate and report the accuracy, precision
and recall to evaluate this model.
Deliverables:
• Code
• Brief report with answers to above questions.
6. Kaggle – Taxi Fare Prediction: (9 marks) The next task of this assignment is to
work on a (completed) Kaggle challenge on taxi fare prediction. As part of this task, please
visit https://www.kaggle.com/c/new-york-city-taxi-fare-prediction to know more about this
problem, and download the data. (You now know how to download data from Kaggle.)
You are allowed to use any machine learning library of your choice: scikitlearn, pandas,
Weka (we recommend scikitlearn), and any regression method too. Use train.csv to
train your classifier. Predict the fares on the data in test.csv, and report your best 2
scores in your report. (We will also upload your codes randomly to confirm the scores.)
Deliverables:
• Code
• Brief report with top-2 scores of your methods, and a brief description of the methods
that resulted in the top 2 scores.
• Your report should also include your analysis of why your best 2 methods performed
better than others you tried.
3

CS5590 Assignment 4 Foundations of Machine Learning

Description

Related products

CSCI251/851 Advanced Programming Assignment 4

CS5590 Assignment 1 Foundations of Machine Learning

COMP1406 – Assignment 4