CS/ECE/ME532 Activity 17

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (1 vote)

1. Alternative regularization formulas. This problem is about two alternative ways
of solving the L2-regularized least squares problem.

a) Prove that for any λ > 0, the following matrix identity holds:
(ATA + λI)
−1AT = AT
(AAT + λI)
−1
Hint: Start by considering the expression ATAAT + λAT and factor it in two
different ways (from the right or from the left).

b) The identity proved in part a) shows that there are actually two equivalent formulas for the solution to the L2-regularized least squares problem. Suppose
A ∈ R
8000×100 and y ∈ R
8000, and use this identity to find w that minimizes
||Aw − y||2
2 + λ||w||2
2
in two different ways. Which formula will compute more
rapidly? Why? Note: The number of operations required for matrix inversion is
proportional to the cube of the matrix dimension.

c) A breast cancer gene database has approximately 8000 genes from 100 subjects.
The label yi
is the disease state of the ith subject (+1 if no cancer, -1 if breast
cancer). Suppose we build a linear classifier that combines the 8000 genes, say
gi
, i = 1, 2, . . . , 100 to predict whether a subject has cancer yˆi = sign{g
T
i w}. Note
that here gi and w are 8000-by-1 vectors.

i. Write down the least squares problem for finding classifier weights w given
100 labels. Does this problem have a unique solution?

ii. Write down a Tikhonov(ridge)-regression problem for finding the classifier
weights given 100 labels. Does this problem have a unique solution? Which
form of the identity in part a) leads to the most computationally efficient
solution for the classifier weights?

2. The key idea behind proximal gradient descent is to reformulate the general regularized
least-squares problem into a set of simpler scalar optimization problems. Consider the
regularized least-squares problem
min
w
||z − w||2
2 + λr(w)

An upper bound and completing the square was used to simplify the generalized leastsquares problem into this form. Let the i
th elements of z and w be zi and wi
, respectively.

a) Assume r(w) = ||w||2
2
. Write the regularized least-squares problem as a series of
separable problems involving only wi and zi
.

b) Assume r(w) = ||w||1. Write the regularized least-squares problem as a series of
separable problems involving only wi and zi
.

3. A script is available to compute a specified number of iterations of the proximal gradient
descent algorithm for solving a Tikhonov-regularized least squares problem
min
w
||y − Xw||2
2 + λ||w||2
2

The provided script will get you started displaying the path taken by the weights in
the proximal gradient descent iteration superimposed on a contour plot of the squared
error surface. Assume y =






2
0
1
0





, the 4-by-2 X = UΣV
T has singular value
decomposition U =





1 0
0 1
0 0
0 0





, Σ =
[
1 0
0 1/2
]
, and V = √
1
2
[
1 1
1 −1
]
. Complete
20 iterations of gradient descent in each case specified below.
Include the plots you generate below with your submission.

a) What is the maximum value for the step size τ that will guarantee convergence?
b) Start proximal gradient descent from the point w =
[
−1
1
]
using a step size
of τ = 0.5 and tuning parameter λ = 0.5. How do you explain the trajectory
the weights take toward the optimum, e.g., why is it shaped this way? What
direction does each iteration move in the regularization step?

c) Repeat the previous case with λ = 0.1 What happens? How does λ affect each
iteration and why?