ELEN 4903: Machine Learning Homework 3

$30.00

Category: Tags: , , , , You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (3 votes)

Problem 1 (Gaussian process coding) – 30 points
In this problem you will implement the Gaussian process model for regression. You will use the same
data used for homework 1 to do this, which is again provided in the data zip file for this homework.
Recall that the Gaussian process treats a set of N observations (x1, y1), . . . ,(xN , yN ), with xi ∈ R
d
and
yi ∈ R, as being generated from a multivariate Gaussian distribution as follows,
y ∼ Normal(0, σ2
I + K), Kij = K(xi
, xj )

use: exp n

1
b
kxi − xjk
2
o.
Here, y is an N-dimensional vector of outputs and K is an N × N kernel matrix. For this problem use
the Gaussian kernel indicated above. In the lecture slides, we discuss making predictions for a new y
0
given x
0
, which was Gaussian with mean µ(x
0
) and variance Σ(x
0
). The equations are shown in the slides.
There are two parameters that need to be set for this model as given above, σ
2
and b.
a) Write code to implement the Gaussian process and to make predictions on test data.
b) For b ∈ {5, 7, 9, 11, 13, 15} and σ
2 ∈ {.1, .2, .3, .4, .5, .6, .7, .8, .9, 1}—so 60 total pairs (b, σ2
)—
calculate the RMSE on the 42 test points as you did in the first homework. Use the mean of the
Gaussian process at the test point as your prediction. Show your results in a table.
c) Which value was the best and how does this compare with the first homework? What might be a
drawback of the approach in this homework (as given) compared with homework 1?
d) To better understand what the Gaussian process is doing through visualization, re-run the algorithm by using only the 4th dimension of xi (car weight). Set b = 5 and σ
2 = 2. Show a scatter
plot of the data (x[4] versus y for each point). Also, plot as a solid line the predictive mean of the
Gaussian process at each point in the training set. You can think of this problem as asking you to
create a test set by duplicating xi[4] for each i in the training set and then to predict that test set.
Problem 2 (Boosting coding) – 30 points
In this problem you will implement boosting for the “least squares” (LS) classifier that we briefly discussed in Lecture 8. Recall that this “classifier” performed least squares linear regression treating the
±1 labels as real-valued responses. Also recall that we criticized this classifier as being “weak,” without
using that word, and so boosting this classifier can be a good illustration of the method (even though it
performs well on the data set you will be using).
Using the training data provided, implement boosting for the LS classifier. You should use the bootstrap
method as discussed in the slides to do this, where each bootstrap set Bt
is the size of the training set.
Recall that if your error t > 0.5, you can simply change the sign of the regression vector w (including
the intercept) and recalculate the error.
Information about the data used for this problem can be found here:
https://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+
but you must use the data provided on Courseworks. Note that the intercept dimension hasn’t been
included in the features provided, so you should add a dimension equal to 1.
a) Run your boosted LS classifier for T = 1500 rounds. In the same plot, show the training and
testing error of f
(t)
boost(·) for t = 1, . . . , T.
b) In a separate plot, show the upper bound on the training error as a function of t. You will need to
use t
to do this. This upper bound is given in the slides for Lecture 13.
c) Plot a histogram of the total number of times each training data point was selected by the bootstrap
method across all rounds. In other words, sum the histograms of all Bt
.
d) In two separate plots, show t and αt as a function of t.