Sale!

COMS 4721: Machine Learning for Data Science Homework 3

$30.00 $18.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (3 votes)

Problem 1 (Boosting coding) – 30 points
In this problem you will implement boosting for the least squares classifier that we briefly discussed in
Lecture 8. Recall that this “classifier” performed least squares linear regression by treating the ±1 labels
as if they were real-valued responses. Also recall that we criticized this classifier as being not a very good
one to use in practice (i.e., “weak”) on its own, and so boosting this classifier can be a good illustration
of the method.
Using the toy data provided, implement the AdaBoost algorithm on the least squares classifier. You
should use the bootstrap method as discussed in the slides to do this, where each bootstrap set Bt
is the
size of the training set. In the data, I have added a dimension equal to 1 for the intercept term. Recall that
if the value of t > 1/2, you can simply change the sign of the regression vector you learned in iteration
t (including the offset term) and recalculate to make t < 1/2.
a) Run your boosted least squares classifier for T = 2500 rounds and plot the empirical training error
of f
(t)
boost(·) for t = 1, . . . , T. In the same plot, show the upper bound on the training error as a
function of t. (This upper bound is given in the slides for Lecture 13.)
b) Show a stem plot of the average of the distribution on the data across all 2500 iterations. (The
empirical average of wt
in the slides over t.)
c) In two separate figures, plot t and αt as a function of t.
Problem 2 (K-means) – 15 points
Implement the K-means algorithm discussed in class. Generate 500 observations from a mixture of three
Gaussians on R
2 with mixing weights π = [0.2, 0.5, 0.3] and means µ and covariances Σ,
µ1 =

0
0

, Σ1 =

1 0
0 1 
, µ2 =

3
0

, Σ2 =

1 0
0 1 
, µ3 =

0
3

, Σ3 =

1 0
0 1 
.
a) For K = 2, 3, 4, 5, show on the same plot the value of the K-means objective function per iteration
for 20 iterations (the algorithm may converge before that).
b) For K = 3, 5, plot the 500 data points and indicate the cluster of each for the final iteration by
marking it in some way.
Problem 3 (Bayes classifier revisited) – 30 points
In this problem, you will implement the EM algorithm for the Gaussian mixture model, with the purpose
of using it in a Bayes classifier. The data is a processed version of the spam email data you looked at in
Homework 2. Now, each labeled pair (x, y) has x ∈ R
10. We discussed how the Bayes classifier learns
class-conditional densities, and unsupervised learning algorithms can be useful here. In this problem,
the class conditional density will be the Gaussian mixture model (GMM). In these experiments, please
initialize all covariance matrices to the empirical covariance of the data being modeled. Randomly
initialize the means by sampling from a single multivariate Gaussian where the parameters are the mean
and covariance of the data being modeled. Initialize the mixing weights to be uniform.
a) Implement the EM algorithm for the GMM described in class. Using the training data provided,
for each class separately, plot the log marginal objective function for a 3-Gaussian mixture model
over 10 different runs and for iterations 5 to 30. (In other words, don’t show iterations 1 through
4.) There should be two plots, each with 10 curves.
b) Using the best run for each class after 30 iterations, predict the testing data using a Bayes classifier
and show the result in a 2 × 2 confusion matrix, along with the accuracy percentage. Repeat this
process for a 1-, 2-, 3- and 4-Gaussian mixture model. Show these results nearby each other. You
don’t need to repeat Part (a) for these other cases. Note that a 1-Gaussian GMM doesn’t require
an iterative algorithm, although your implementation will likely still work in this case.