Description
Kernel Ridge Regression
Question 1 (5 points) Suppose that we wish to find the vector w that minimizes
w ⋅ x − y +
n=1
∑
N
(
(n) (n))
2
λ∥w∥ .
2
Show that the optimal w is
5/16/2021 4 Kernels
https://www.notion.so/justindomke/4-Kernels-ef34b3c97a9c43b3895174b0f11a8fd8 3/9
2
w =
∗ ( x x + λI x y .
n=1
∑
N
(
(n)) (
(n))
⊤
)
−1
n=1
∑
N
(n) (n)
Question 2 (5 points) Suppose now that we apply a basis expansion to so that the
goal is instead to minimize
x
2
w ⋅ h(x ) − y +
n=1
∑
N
(
(n) (n))
2
λ∥w∥ .
2
What now is the optimal w ? Argue why your answer is correct. ∗
Question 3 (5 points) Take some new input . Naively, we would predict
. Suppose, however, that we have access to some kernel function such
that . Derive an expression for that makes no reference
to or . (Your expression can use “temporary” variables if you like. These must
not reference or either of course!))
x
pred y
pred =
w ⋅
∗ h(x )
pred k
k(x, x
′) = h(x) ⋅ h(x )
′ y
pred
h(⋅) w
∗
h(⋅) w
∗
Question 4 (5 points) Consider a 1-D input . Consider the following polynomial basis
expansion:
x
9+
h(x) = [c0, c1x, c2x ,⋯ , c x ], c =
2
P
P
p ( ,
p
P )
Derive a kernel function such that for all and .
(Hint: your kernel function should be significantly simpler that the basis expansion you
started with.)
k(x, x )
′ k(x, x
′) = h(x) ⋅ h(x )
′ x x
′
You are given a file data_synth.npz . This contains 200 1D inputs and 1D outputs,
plotted below.
5/16/2021 4 Kernels
https://www.notion.so/justindomke/4-Kernels-ef34b3c97a9c43b3895174b0f11a8fd8 4/9
data_synth.npz 7.2KB
As in previous assignments, the data can be loaded as follows:
2
stuff = np.load(“data_synth.npz”) X_trn = stuff[‘X_trn’] Y_trn =
stuff[‘Y_train’] X_val = stuff[‘X_val’] Y_val = stuff[‘Y_val’]
Here is a function that performs that basis expansion from the question above. (You’d
use this by doing h=get_poly_expansion(5) and then h(X) .)
4
from scipy.special import comb def get_poly_expansion(P): def expand(X): tmp
= [np.sqrt(comb(P,p))*X**p for p in range(P+1)] return np.vstack(tmp).T
return expand # example usage h = get_poly_expansion(5) expansion =
h(X_trn[0])
Question 5 (5 points) Provide a function that will evaluate ridge-regression when using
basis expanded data, i.e. evaluate . Your function should have the
following signature:
fw(x) = w ⋅ h(x)
def eval_basis_expanded_ridge(x,w,h): # do stuff return y
5/16/2021 4 Kernels
https://www.notion.so/justindomke/4-Kernels-ef34b3c97a9c43b3895174b0f11a8fd8 5/9
Here x is a single input, w is a vector of weights and h is a basis-expansion function
(e.g. what you’d get from calling get_poly_expansion(3) ). The return value y is just a
scalar. Provide your function directly in your report.
Question 6 (5 points) Provide a function that will do ridge-regularization on basisexpanded data, i.e. minimize
f (x ) − y +
n=1
∑
N
( w
(n) (n))
2
λ∥w∥ ,
2
over where is the function you implemented in the previous question. Your
function should have the following signature.
w fw
4
def train_basis_expanded_ridge(X,Y,λ,h): # do stuff return w
Here X is a 1D array of inputs, Y is a 1D array of training outputs, λ is a
regularization constant, and h is a basis-expansion function. Provide your function
directly in your report.
1
Question 7 (5 points) For each value of do basis-expanded ridge
regression using a th order basis expansion on the given dataset with . For
each value of please:
P ∈ {1, 2, 3, 5, 10},
P λ = 0.1
P
• Report the vector w that you recovered.
• Make a plot of the final learned function as a function of . Plot this
between and , superimposed on the training data.
fw(x) x 8
x = 0 x = 15
Question 8 (5 points) Implement a method to get a polynomial kernel. Your function
should have the following signature
2
def get_poly_kernel(P): def k(x,xp): # do stuff return kernel_value return k
5/16/2021 4 Kernels
https://www.notion.so/justindomke/4-Kernels-ef34b3c97a9c43b3895174b0f11a8fd8 6/9
This kernel function should be equivalent to taking the inner-product of two basis
expansions, i.e. that . Your kernel function must never form or
create a basis expansion, and must have a time complexity that doesn’t depend on P
(assume that you can take the power of a scalar in constant time.) Provide your
function directly in your report.
k(x, x
′) = h(x) ⋅ h(x )
′
Question 9 (5 points) Run the following code
x = 0.5 xp = 0.7 k = get_poly_kernel(5) h = get_poly_expansion(5) out1 =
k(x,xp) out2 = np.inner(h(x),h(xp)) print(“output 1”, out1) print(“output 2”,
out2)
what is the output?
Question 10 (5 points) Implement a function to train a kernel ridge regression model.
Given a dataset of inputs x and outputs you should compute (n) y
(n)
α = (K + λI) y
−1
where is the kernel function evaluated on the th and th
inputs. Your function should have the following signature:
Knm = k(x , x )
(n) (m) n m
def train_kernel_ridge(X,Y,λ,k): # do stuff return α
Here X is a 1D array of inputs, Y is a 1D array of training outputs, λ is a
regularization constant, and k is a kernel function. Do not use numpy.inv in your
solution. If you’re tempted to do that, look into numpy.linalg.solve instead. Provide
your solution directly in your report. You can/should call your kernel function from the
previous question.
Question 11 (5 points) Implement a function to evaluate a kernel ridge regression
model. Given a dataset of inputs , a vector ; a kernel function , and a new input
, you should compute
x
(n) α k
x
5/16/2021 4 Kernels
https://www.notion.so/justindomke/4-Kernels-ef34b3c97a9c43b3895174b0f11a8fd8 7/9
α k(x , x).
n=1
∑
N
n
(n)
Your function should have the following signature:
def eval_kernel_ridge(X_trn, x, α, k): # do stuff return y
Here X_trn is a 1D array of training inputs, x is the input to evaluate, α is a vector of
learned components, and k is a kernel function. Provide your solution directly in your
report.
Question 12 (5 points) For each value of do kernel-expanded
ridge regression using a -th order polynomial kernel on the given dataset with
. For each value of make a plot of the final learned function as a function
of . Plot this between and , superimposed on the training data. (You
can give either 5 separate plots, one for each value of or a single plot with the
different values of superimposed and labeled.)
P ∈ {1, 2, 3, 5, 10}, 7
P λ =
0.1 P fw(x)
x x = 0 x = 15
P
P
Question 13 (5 points) How do your results using kernel ridge regression compare to
those you obtained using basis-expanded ridge regression? Explain why in at most two
sentences.
Support Vector Machines
data_real.npz 49.0KB 3
You are given a file with 686 training inputs of length 4 ( x_trn ) and corresponding
outputs ( y_trn ). You are also given 686 test inputs ( x_tst ).
For this question, you will perform classification using support vector machines on this
dataset. For this question you allowed (and encouraged) to use sklearn ‘s
implementation. However, you may not use sklearn.model_selection.cross_val_score .
6
5/16/2021 4 Kernels
https://www.notion.so/justindomke/4-Kernels-ef34b3c97a9c43b3895174b0f11a8fd8 8/9
In the questions below, we define a support vector machine to be the result of
minimizing
L y , f (x ) +
α
min
n=1
∑
N
(
(n)
α
(n) ) λ α α k(x , x ),
n=1
∑
N
m=1
∑
M
n m
(n) (m)
where
2
L(y, f) = max(0, 1 − yf)
is the hinge loss, and
3
fα(x) = α k(x, x ).
n=1
∑
N
n
(n)
Question 14 (10 points) Train a support vector machine with a linear kernel with each of
the following regularization penalties: . Estimate the mean validationset hinge loss using 5-fold cross validation. Give your results here as a table with one
entry for each value of . (Note: Make sure you understand exactly what the sklearn
arguments do!)
λ ∈ {2, 20, 200}
λ
Question 15 (10 points) We can define a polynomial kernel with a constant term as 5
k(x, x ) =
′
(γ + x ⋅ x ) ,
′ P
where is the degree and is some constant. For each of and
, train a support vector machine using a polynomial kernel of degree
and a regularization penalty of . Estimate the validation-set hinge loss using 5-
fold cross validation. Give your results as a 3-column table with one entry for each
pair. (Make sure it’s clear what entry corresponds to which value.)
P γ γ ∈ {1, .01, .001} 4
λ ∈ {2, 20, 200}
P = 3 λ
(γ, λ)
5/16/2021 4 Kernels
https://www.notion.so/justindomke/4-Kernels-ef34b3c97a9c43b3895174b0f11a8fd8 9/9
Question 16 (10 points) Repeat the previous question, but using a polynomial of degree
P = 5 instead.
Question 17 (10 points) We can define a “radial basis function” kernel as
k(x, x ) =
′ exp (−γ∥x − x ∥ .
′ 2)
Again, for each of the and values, train and
evaluate this model using 5-fold cross-validation. Report your estimated generalization
error (validation-set hinge loss) in a 3×3 table.
γ ∈ {1, .01, .001} λ ∈ {2, 20, 200} 6
Question 18 (10 points) Which kernel (with which value , if applicable) and which
regularization constant performed best? Fix this kernel, retrain on all the data, and then
make predictions for the test data. Upload these to the
(again, please use your UMass email so that we can identify your
submission!) and report here your error on the public leaderboard. Please give:
γ 8
Assignment 4 Kaggle
competition
• What kernel you chose.
• Your estimated 0-1 generalization error (i.e., 1 – accuracy). 1
• Your observed generalization error on the leaderboard.