Description
In this programming assignment you will implement and test some of the regression methods that
were introduced in the lecture and in the problem sets. Let f(x, θ) be a function with input x ∈ R
d
and parameters θ ∈ R
D, and
f(x, θ) = φ(x)
T
θ, (1)
where φ(x) : R
d → R
D is a feature transformation of x. For example, the K-th order polynomial
with input x ∈ R can be expressed as
f(x, θ) = X
K
k=0
x
k
θk = φ(x)
T
θ, (2)
where the feature transformation and parameters are
φ(x) =
1, x, x2
, · · · , xK
T
∈ R
K+1, θ =
θ0, · · · , θK
T
∈ R
K+1
. (3)
Our goal is to obtain the best estimate of the function given iid samples D = {(x1, y1), . . . ,(xn, yn)},
where yi are the noisy observations of f(xi
, θ). We have seen several ways of doing the regression,
using different assumed noise models and formulations. For convenience, define the following quantities,
y =
y1, · · · , yn
T
, Φ =
φ(x1), · · · , φ(xn)
, X =
x1, · · · , xn
. (4)
Here is a summary of the various regression algorithms we have seen so far:
method objective function parameter estimate ˆθ prediction f∗ for input x∗
least-squares (LS)
y − Φ
T
θ
2 ˆθLS = (ΦΦT
)
−1Φy f∗ = φ(x∗)
T ˆθ
regularized LS (RLS)
y − Φ
T
θ
2
+ λ kθk
2 ˆθRLS = (ΦΦT + λI)
−1Φy f∗ = φ(x∗)
T ˆθ
L1-regularized LS (LASSO)
y − Φ
T
θ
2
+ λ kθk1 QP solver (see Prob. 3.12) f∗ = φ(x∗)
T ˆθ
robust regression (RR)
y − Φ
T
θ
1
LP solver (see Prob. 2.10) f∗ = φ(x∗)
T ˆθ
distributions posterior predictive
Bayesian regression (BR) θ ∼ N (0, αI) θ|X, y ∼ N (ˆµθ, σˆ
2
θ
) f∗|X, y, x∗ ∼ N (ˆµ∗, σˆ
2
∗
)
y|x, θ ∼ N (f(x, θ), σ2
) µˆθ =
1
σ2 Σˆ
θΦy µˆ∗ = φ(x∗)
T µˆθ
Σˆ
θ = ( 1
α
I +
1
σ2 ΦΦT
)
−1 σˆ
2
∗ = φ(x∗)
T Σˆ
θφ(x∗)
1