Sale!

CS260: Machine Learning Algorithms Homework 2

$30.00 $18.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (3 votes)

Problem 1. True or False
Decide whether the following statements are true or false. Justify your answers.
(a) (10 pt) If classifier A has smaller training error than classifier B, then classifier A will have smaller generalization (test) error than classifier B.
(b) (10 pt) The VC dimension is always equal to the number of parameters in the model.
(c) (10 pt) For non-convex problems, gradient descent is guaranteed to converge to the global minimum.
Problem 2. Multiple choice questions
Choose the correct answer and justify your answer.
(a) (20 pt) Which of the following is not a possible growth function mH(N) for some hypothesis set? (1) 2N
(2) 2

N (3) 1 (4) N2 − N + 2 (5) none of the other choices
Problem 3. Proximal Gradient Descent
Consider solving the following problem
min
w
kXw − yk
2
2 + λkwk1,
where X ∈ Rn×d
is the feature matrix (each row is a feature vector), y ∈ Rn is the label vector, kwk1 := P
i
|wi
|
and λ > 0 is a constant to balance loss and regularization. This is known as the Lasso regression problem and
our goal is to derive the “proximal gradient method” for solving this.
• (10 pt) The gradient descent algorithm cannot be directly applied since the objective function is nondifferentiable. Discuss why the objective function is non-differentiable.
• (30 pt) In the class we showed that gradient descent is based on the idea of function approximation. To form
an approximation for non-differentiable function, we split the differentiable part and non-differentiable
part. Let g(w) = kXw − yk
2
2
, as discussed in the gradient descent lecture we approximate g(w) by
g(w) ≈ gˆ(w) := g(wt) + ∇g(wt)
T
(w − wt) + η
2
kw − wtk
2
.
In each iteration of proximal gradient descent, we obtain the next iterate (wt+1) by minimizing the
following approximation function:
wt+1 = arg min
w
gˆ(w) + λkwk1.
Derive the close form solution of wt+1 given wt, ∇g(wt), η, λ. What’s the time complexity for one proximal
gradient descent iteration?