## Description

1. Data Fitting vs. Sparsity Tradeoff. This assignment uses the dataset BreastCancer.mat

to explore sparse regularization of a least squares problem. The journal article “A geneexpression signature as a predictor of survival in breast cancer” provides background

on the role of genes in breast cancer.

The goal is to solve the Lasso problem

w∗ = arg min

w∈Rn

kAw − dk

2

2 + λkwk1

Here w is the weight vector applied to the expression levels of 8141 genes and there

are 295 patients (feature sets and labels). In this problem we will vary λ to explore

the tradeoff between data-fitting and sparsity.

Scripts that implement iterative soft thresholding via proximal gradient descent to

solve the LASSO problem are available. The scripts use a hot start procedure for

finding the solution with different values for λ. The initial guess for the next value of

λ is the converged solution for the preceding value. This accelerates convergence when

subsequent values of λ lead to similar solutions.

a) Write code to find the optimal weights using only the first 100 patients (first 100

rows). Create a plot with the residual kAw∗ −dk2 on the vertical-axis and kw∗k1

on the horizontal-axis, parameterized by λ.

In other words, create the curve by

finding w∗

for different λ, and plotting kw∗k1 vs. kAw∗ − dk2. Experiment with

λ to find a range that captures the variation from the least-squares solution (small

λ) to the all zeros solution (large λ). Appropriate values of λ may range from

10−6

to 20, spaced logarithmically. Explain the result.

b) Next use your solutions from part a) to plot the error rate on the vertical-axis

versus the sparsity on the horizontal-axis as λ varies. Define the error rate as the

number of incorrect predictions divided by the total number of predictions and

the sparsity as the number of nonzero entries in w∗

.

For this purpose, we’ll say an

entry wi

is nonzero if |wi

| > 10−6

. Calculate the error rate using the training data,

the data used to find the optimal weights. Explain the result.

c) Repeat parts a) and b) to display the residual and error rate, respectively using

validation or test data, rows 101-295 of the data matrix, that is, the data not

used to design the optimal classifier. Again, explain what you see in each plot.

2. Now compare the performance of the LASSO and ridge regression for the breast cancer

dataset using the following steps:

• Randomly split the set of 295 patients into ten subsets of size 29-30.

• Use the data in eight of the subsets to find a solution to the Lasso optimization

above and to the ridge regression problem

min

w

kAw − dk

2

2 + λkwk

2

2

.

Repeat this for a range of λ values to obtain a set of solutions wλ.

• Compute the prediction error using each wλ on one of the remaining two of the

ten subsets. Use the solution that has the smallest prediction error to find the

best λ. Note that LASSO and ridge regression will produce different best values

for λ.

• Compute the test error on the final subset of the data for the choice of λ that

minimizes the prediction error. Compute both the squared error and the error

rate.

Repeat this process for different subsets of eight training, one tuning (λ) and one

testing subsets, and compute the average squared error and average number of misclassifications across all different subsets.

Note that you should use the identity derived in Problem 1 of the Activity 5.2 in order

to speed the computation of ridge regression.