Sale!

ROB313: Introduction to Learning from Data Assignment 2

$30.00 $18.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (1 vote)

Q1) 2pts Derive a closed form expression for the weights of the generalized linear model,
fb(x, w) = w0 +
PM−1
j=1 wjφj (x), using a least-squares loss and general Tikhonov regularization.

The optimization problem to be solved for the weights can be written
as
arg min
w∈RM


X
N
i=1

y
(i) − w0 −
M
X−1
j=1
wjφj (x
(i)
)


2
+
X
M
i=1
X
M
j=1
Γijwi−1wj−1

 ,
where Γ ∈ RM×M is a symmetric positive semi-definite matrix whose ijth entry is
given by Γij .

Q2) 2pts Considering the GLM
fb(x, α) = X
N
i=1
αik(x, x
(i)
),
derive a computational strategy to estimate α = {α1, α2, . . . , αN }
T ∈ R
N by minimizing the objective function PN
i=1
y
(i) − fb(x
(i)
, α)
2 + λ
PN
i=1 α
2
i
. Compare this
expression for the weights to those we derived in class using the dual representation.
Are they different or the same? Explain why.

Q4) 4pts Construct a radial basis function (RBF) model that minimizes the least-squares loss
function. Use a Gaussian kernel and consider the grid of shape parameter values θ =
{0.05, 0.1, 0.5, 1, 2}, consider the grid of regularization parameters {0.001, 0.01, 0.1, 1},
and construct the model using Cholesky factorization. Select the hyperparameters
across the grid of possible values by evaluating on the validation set. Construct the
model on the datasets iris, rosenbrock (with n train=1000, d=2), and mauna loa.

Use both the training and validation sets to predict on the test set, and format your
results in a table (present test RMSE for regression datasets, and test accuracy for
classification datasets).

Q5) 5pts Implement a greedy regression algorithm using a dictionary of basis functions. Design
the dictionary of basis functions by observing the structure of the one-dimensional
mauna loa training dataset. The dictionary should contain at least 200 basis functions.
Justify your design choices1
.

Use the orthogonal matching pursuit metric to select
1Note that you shouldn’t need to consider each basis function individually. It is likely that the basis
functions you design will have free parameters in which case you could include multiple basis functions with
different values of these free parameters in your dictionary.

a new basis function at each iteration. Use the minimum description length (MDL)
defined below as a stopping criterion for your greedy algorithm
N
2
log(`2 − loss) + k
2
log N,
where `2−loss is simply the least-squares training error and k is the iteration number
(or number of terms in the greedy model).

The MDL metric can be considered to be
a surrogate of the generalization error – in other words, this metric will decrease as
the model complexity (k) grows and then increase as overfitting starts to occur.

Apply your algorithm to the mauna loa dataset. Use both the training and validation
sets to predict on the test set, plot the prediction relative to the test data, and
present the test RMSE. Comment on the performance of your model. Also, report
and comment on the sparsity of your model.

Submission guidelines: Submit an electronic copy of your report in pdf format, and
documented python scripts. You should include a file named “README” outlining how
the scripts should be run. Upload a single tar or zip file containing all files to Quercus. You
are expected to verify the integrity of your tar/zip file before uploading. Do not include (or
modify) the supplied *.npz data files or the data utils.py module in your submission.

The
report must contain
• Objectives of the assignment
• A brief description of the structure of your code, and strategies employed
• Relevant figures, tables, and discussion
Do not use scikit-learn for this assignment, the intention is that you implement the simple
algorithms required from scratch. Also, for reproducibility, always set a seed for any random
number generator used in your code. For example, you can set the seed in numpy using
numpy.random.seed