Asmt 5: Regression

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (4 votes)

1 Singular Value Decomposition (20 points)
First we will compute the SVD of the matrix A we have loaded
[U,S,V] = svd(A)
Then take the top k components of A for values of k = 1 through k = 10 using
Uk = U(:,1:k)
Sk = S(1:k,1:k)
Vk = V(:,1:k)
Ak = Uk*Sk*Vk’
A (10 points): Compute and report the L2 norm of the difference between A and Ak for each value of k
using
norm(A-Ak,2)
B (5 points): Find the smallest value k so that the L2 norm of A-Ak is less than 10% that of A; k might
or might not be larger than 10.
C (5 points): Treat the matrix as 1125 points in 30 dimensions. Plot the points in 2 dimensions in the way
that minimizes the sum of residuals squared.
CS 6140 Data Mining; Spring 2015 Instructor: Jeff M. Phillips, University of Utah
2 Frequent Directions (40 points)
Use the stub file FD.m to create a function for the Frequent Directions algorithm (Algorithm 16.2.1). We
will consider running this code on matrix A.
A (15 points): We can measure the error maxkxk=1 |kAxk
2 − kBxk
2
| as norm(A’*A – B’*B, 2).
How large does l need to be for the above error to be at most kAk
2
F
/10? How does this compare to the
theoretical bound (e.g. for k = 0).
Note: you can calculate kAk
2
F
as norm(A, ’fro’)ˆ2.
B (25 points): Frequent Directions should also satisfy another bound based on its Frobenious norm.
We can compute AΠBk
using Bk = B(1:k,:) and then calculating A * pinv(Bk’) * Bk A *
pinv(Bk) * Bk. How large does l need to be to achieve
kA − AΠBk
k
2
F ≤ 1.1 · kA − Akk
2
F
;
for each value k ∈ {1, 2, 3, 4, 5, 6, 7}. Answer both by running your algorithm and reporting the theoretical
bound provided in the notes. (e.g., you should report 7 pairs of values, an empirical and theoretical bound
for each value k)
3 Linear Regression (40 points)
We will find coefficients C (was a1, . . . , ad in notes, but changed to avoid confusion with matrix A in Q1)
to estimate X*C ≈ Y, using the provided datasets X and Y. We will compare two approaches least squares
and ridge regression.
Least Squares: Set A = inverse(X’ * X)*X’*Y
Ridge Regression: Set As = inverse(X’*X + sˆ2*eye(12))*X’*Y
A (20 points): Solve for the coefficients C (or Cs) using Least Squares and Ridge Regression with
s = {0.1, 0.3, 0.5, 1.0, 2.0} (i.e. s will take on one of those 5 values each time you try, say obtaining C2 for
s = 2). For each set of coefficients, report the error in the estimate Yˆ of Y as norm(Y – X*C,2).
B (20 points): Create three row-subsets of X and Y
• X1 = X(1:66,:) and Y1 = Y(1:66)
• X2 = X(34:100,:) and Y2 = Y(34:100)
• X3 = [X(1:33,:); X(67:100,:)] and Y3 = [Y(1:33); Y(67:100)]
Repeat the above procedure on these subsets and cross-validate the solution on the remainder of X and
Y. Specifically, learn the coefficients C using, say, X1 and Y1 and then measure norm(Y(67:100) –
X(67:100,:)*C,2).
Which approach works best (averaging the results from the three subsets): Least Squares, or for which
value of s using Ridge Regression?
4 BONUS (3 points)
Consider a linear equation W = M*S where M is a measurement matrix filled with random values {−1, 0, +1}
(although now that they are there, they are no longer random), and W is the output of the sparse signal S
when measured by M.
Use Orthogonal Matching Pursuit (as described in the notes) to recover the non-zero entries from S.
Record the order in which you find each entry and the residual vector after each step.
CS 6140 Data Mining; Spring 2015 Instructor: Jeff M. Phillips, University of Utah