CS/ECE/ME532 Assignment 8 Data Fitting vs. Sparsity Tradeoff

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment

Description

5/5 - (1 vote)

1. Data Fitting vs. Sparsity Tradeoff. This assignment uses the dataset BreastCancer.mat
to explore sparse regularization of a least squares problem. The journal article “A geneexpression signature as a predictor of survival in breast cancer” provides background
on the role of genes in breast cancer.

The goal is to solve the Lasso problem
w∗ = arg min
w∈Rn
kAw − dk
2
2 + λkwk1

Here w is the weight vector applied to the expression levels of 8141 genes and there
are 295 patients (feature sets and labels). In this problem we will vary λ to explore
the tradeoff between data-fitting and sparsity.

Scripts that implement iterative soft thresholding via proximal gradient descent to
solve the LASSO problem are available. The scripts use a hot start procedure for
finding the solution with different values for λ. The initial guess for the next value of
λ is the converged solution for the preceding value. This accelerates convergence when
subsequent values of λ lead to similar solutions.

a) Write code to find the optimal weights using only the first 100 patients (first 100
rows). Create a plot with the residual kAw∗ −dk2 on the vertical-axis and kw∗k1
on the horizontal-axis, parameterized by λ.

In other words, create the curve by
finding w∗
for different λ, and plotting kw∗k1 vs. kAw∗ − dk2. Experiment with
λ to find a range that captures the variation from the least-squares solution (small
λ) to the all zeros solution (large λ). Appropriate values of λ may range from
10−6
to 20, spaced logarithmically. Explain the result.

b) Next use your solutions from part a) to plot the error rate on the vertical-axis
versus the sparsity on the horizontal-axis as λ varies. Define the error rate as the
number of incorrect predictions divided by the total number of predictions and
the sparsity as the number of nonzero entries in w∗
.

For this purpose, we’ll say an
entry wi
is nonzero if |wi
| > 10−6
. Calculate the error rate using the training data,
the data used to find the optimal weights. Explain the result.

c) Repeat parts a) and b) to display the residual and error rate, respectively using
validation or test data, rows 101-295 of the data matrix, that is, the data not
used to design the optimal classifier. Again, explain what you see in each plot.

2. Now compare the performance of the LASSO and ridge regression for the breast cancer
dataset using the following steps:
• Randomly split the set of 295 patients into ten subsets of size 29-30.

• Use the data in eight of the subsets to find a solution to the Lasso optimization
above and to the ridge regression problem
min
w
kAw − dk
2
2 + λkwk
2
2
.

Repeat this for a range of λ values to obtain a set of solutions wλ.

• Compute the prediction error using each wλ on one of the remaining two of the
ten subsets. Use the solution that has the smallest prediction error to find the
best λ. Note that LASSO and ridge regression will produce different best values
for λ.

• Compute the test error on the final subset of the data for the choice of λ that
minimizes the prediction error. Compute both the squared error and the error
rate.

Repeat this process for different subsets of eight training, one tuning (λ) and one
testing subsets, and compute the average squared error and average number of misclassifications across all different subsets.

Note that you should use the identity derived in Problem 1 of the Activity 5.2 in order
to speed the computation of ridge regression.