# Problem Set 5 CS 4301

\$30.00

Category:

5/5 - (1 vote)

## Problem 1: Linear Regression (100 pts)

Suppose that we are given data points x
(1), . . . , x(M) ∈ R
d with corresponding observed outcomes
y
(1), . . . , y(M) ∈ R. In least squares regression, the aim is to find a function of the form f : R
d → R
with f(x) = w

T x + b for some w ∈ R
d
, b ∈ R such that difference between f(x
(m)
) and y
(m)
is as
good as possible on average.

Formally, we solve the following convex optimization problem.
min
w∈Rd,b∈R
1
M
X
M
m=1

y
(m) − w
T x
(m) − b
2

Consider a penalized regression problem where we choose some λ > 0 and some regularizer h and
solve the following optimization problem.
min
w∈Rd,b∈R

1
M
X
M
m=1

y
(m) − w
T x
(m) − b
2
+ λh(w)
#

In this problem, we will consider the case h(w) = Pd
i=1 |wi
|.
1. In Python, implement subgradient descent for the above optimization problem. That is, write
a function that takes as input a matrix X ∈ R
m×n whose columns are the data points a vector
y of observed values, a λ > 0, an initial guess for w and b, and a number of iterations its
and returns the result of performing subgradient descent for its iterations starting from the
specified initial point.

2. In Python, implement proximal gradient descent (with h chosen as above) for the above
optimization problem. That is, write a function that takes as input a matrix X ∈ R
m×n
whose columns are the data points a vector y of observed values, a λ > 0, an initial guess
for w and b, and a number of iterations its and returns the result of performing proximal
gradient descent for its iterations starting from the specified initial point.

3. Use the data set attached to this homework and different choices of λ to explain which method
you think performs the best. Note, in the attached data, each row is of the form (x
(m)
T
, y(m)
).

4. What is the proximal update if h(w) = 1
2w
T Aw + b
T w for some positive semidefinite matrix
A?
1