## Description

1. A breast cancer gene database has approximately 8000 genes from 100 subjects. The

label yi

is the disease state of the ith subject (+1 if no cancer, -1 if breast cancer). Suppose we build a linear classifier that combines the 8000 genes, say gi

, i = 1, 2, . . . , 100

to predict whether a subject has cancer ˆyi = sign{g

T

i w}.

Note that here gi and w are

8000-by-1 vectors. You recall from the previous period that the least-squares problem

for finding classifier weights has no unique solution.

Your hypothesis is that a relatively small number of the 8000 genes are predictive of

the cancer state. Identify a regularization strategy consistent with this hypothesis and

justify your choice.

2. Consider the least-squares problem minw ||y − Xw||2

2 where y = 4 and X =

2 1

.

a) Does this problem have a unique solution? Why or why not?

b) Sketch the contours of the cost function f(w) = ||y−Xw||2

2

in the w1−w2 plane.

c) Now consider the LASSO minw ||w||1 subject to ||y − Xw||2

2 < 1. Find the

solution using the following steps

i. Repeat your sketch from part b).

ii. Add a sketch of ||w||1 = c

iii. Find the w that satisfies ||y − Xw||2

2 = 1 with the minimum possible value

of ||w||1.

d) Use your insight from the previous part to sketch the set of solutions to the

problem minw ||y − Xw||2

2 + λ||w||1 for 0 < λ < ∞.

3. The script provided has a function that will compute a specified number of iterations

of the proximal gradient descent algorithm for solving the `1-regularized least-squares

problem

min

w

||y − Xw||2

2 + λ||w||1

The script will get you started displaying the path taken by the weights in the proximal

gradient descent iteration superimposed on a contour plot of the squared error surface

for the cost function defined in problem 2. part b) starting from w(0) =

”

0

0

#

. The

script assumes λ = 4 and τ = 1/4.

Include the plots you generate below with your submission.

a) How many iterations does it take for the algorithm to converge to the solution?

What is the converged value for w?

b) Change to λ = 2. How many iterations does it take for the algorithm to converge

to the solution? What is the converged value for w?

c) Explain what happens to the weights in the regularization step.

4. Use the proximal gradient algorithm to solve minw ||y − Xw||2

2 + 4||w||1 for the parameters defined in problem 2.

a) What is the maximum value for the step size in the negative gradient direction,

τ?

b) Suppose τ = 0.1 and you start at w(0) =

”

0

0

#

.

Calculate the first two complete

iterations of the proximal gradient algorithm and depict w(0)

, z

(1)

, w(1)

, z

(2) and

w(2) on a sketch of the cost function identical to the one you created in problem

2.b).