Description
1. Kernel regression. Kernel regression predicts a value d corresponding to value x as
ˆd(x) = PN
i=1 αiK(x, xi
) where the measured data is (d
i
, xi
), i = 1, 2, . . . N and K(u, v)
is the kernel function. We will assume Gaussian kernels, K(u, v) = exp (−(u − v)
2/(2σ
2
)).
Scripts are provided to help you explore properties of kernel regression with respect to
the kernel parameter σ and ridge regression parameter λ.
a) Run the regression script with σ = 0.04 and λ = 0.01. Figure 1 displays several
of the kernels K(x, xi
). What is the value x
i associated with the kernel having
the third peak from the left? What property of the kernel is determined by x
i
?
What property is determined by σ?
b) Run the regression script for the following choices of regularization and kernel
parameters:
i. λ = 0.01, σ = 0.04
ii. λ = 0.01, σ = 0.2
iii. λ = 0.01, σ = 1
iv. λ = 1, σ = 0.04
v. λ = 1, σ = 0.2
(Note that you need to rerun the entire script each time to ensure the random
number generator is reset and you obtain identical data.)
You may choose additional cases if it helps you understand the nature of the solution. Discuss how λ
and σ affect the characteristics of the kernel regression to the measured data, and
support your conclusions with rationale and plots.
c) What principle could you apply to select appropriate values for λ and σ?
2. Kernel Classification. The kernel classification script performs classification using
the squared error loss using the Gaussian kernel K(u, v) = exp (−||u − v||2
2
/(2σ
2
)).
The code is set up to use N=500 training samples.
The code creates a contour plot of the predicted class, before thresholding (i.e, before
applying the sign function).
Run the code for the following values of the kernel parameter σ.
a) σ = 5
b) σ = 0.05
1 of 2
c) σ = 0.005
Use the results to discuss the impact of the kernel parameter σ. Is there a downside
to choosing a very small value for σ? Run additional values for σ if needed.
3. SVM. You use a kernel-based support vector machine for binary classification with
labels d
i = {+1, −1}. Given training features and labels (x
i
, di
), i = 1, 2, . . . , N you
use a kernal K(u, v) and design the classifier weights α as
αˆ = arg min
α
X
N
i=1
1 − d
iX
N
j=1
αjK(x
i
, x
j
)
!
+
+ λ
X
N
i=1
Xn
j=1
αiαjK(x
i
, x
j
)
a) Assume the optimization problem has been solved to obtain the weights α. Express the classification procedure for a measured feature x.
b) Suppose N = 1000 and αi = 0, i = 1, 2, . . . , 99, 102, 103, . . . , 1000. Identify the
support vectors and write the classification procedure in terms of the support
vectors.