Description
2 Problem 2 (10 points)
Regularized risk minimization: Modify the Matlab code for “polyreg.m” such that it learns a multivariate regression function f : R
100 → R, where the basis functions are of the form
f(x; θ) = X
k
i=1
θixi
The data-set is available in “problem2.mat”. As before, the x variable contains {x1, . . . , xN } and the
y variable contains their scalar labels {y1, . . . , yN }.
Use an l2 loss function to penalize the complexity of the model, e.g. minimize the risk
Rreg(θ) = 1
N
X
N
i=1
1
2
(yi − f(x; θ))2 +
λ
2N
kθk
2
Use two-fold cross validation (as in Problem 1) to find the best value for λ. Include a plot showing
training and testing risk across various choices of λ. A reasonable range for this data set would be
from λ = 0 to λ = 1000. Also, mark the λ which minimizes the testing error on the data set.
What do you notice about the training and testing error?
3 Problem 3 (10 points)
Logistic Squashing Function. The logistic squashing function is given by g(z) = 1/(1 + exp(−z)).
Show that it satisfies the property g(−z) = 1−g(z). Also show that its inverse is given by g
−1
(y) =
ln(y/(1 − y).
4 Problem 4 (20 points)
Logistic Regression: Implement a linear logistic regression algorithm for binary classification in
Matlab using gradient descent. Your code should accept a dataset {(x1, y1), . . . ,(xN , yN )} where
xi ∈ R
d and yi ∈ {0, 1} and find a parameter vector θ ∈ R
d
for the classification function
f(x; θ) =
1 + exp(−θ
>x)
−1
which minimizes the empirical risk with logistic loss
Remp(θ) = 1
N
X
N
i=1
(yi − 1) log(1 − f(xi
; θ)) − yi
log(f(xi
; θ)).
Since you are using gradient descent, you will have to specify the step size η and the tolerance .
Pick reasonable values for η and to then use your code to learn a classification function for the
dataset in “dataset4.mat”. Type “load dataset4” and you will have the variables X (input vectors)
and Y (binary labels) in your Matlab environment which contain the dataset.
Show any derivations you need to make for this algorithm
Use the whole data set as training. Show with figures the resulting linear decision boundary on the
2D X data. Show the binary classification error and the empirical risk you obtained throughout
the run from random initialization until convergence. Note the number of iterations needed for your
choice of η and .