# Assignment 8 Machine Learning COMS 4771

\$30.00

## Description

1) This question pertains to the variables X and y_int in the midterm data file data.mat. Apologies
to those of you who have already done this or similar analysis during the midterm. Consider the entire
set of N=10000 input datapoints (rows of X) as well as the two classes of datapoints (rows of X
corresponding to the different values of y_int) hereby defined X1 and X2. Use principal
component analysis for eigenvector decomposition of each of these three sets. Create three scatterplots, each in 3D, and each of all N datapoints, color coded by class. The first scatter-plot will be
along the coordinate systems defined by the top principal components of X. The second and third
scatter-plots will be along the coordinate systems defined by the top principal components of X1 and
X2. Submit three MatLab figure files for the three plots (scatterX.fig, scatterX1.fig,
scatterX2.
[20pt]
2) Consider the exponential distribution defined in Assignment 7 Problem 2. A mixture of exponentials
is a random variable whose distribution has a parameter O chosen at random among {Oi} , i=1,..,K
with respective prior probabilities {Si} . This would model, e.g., your pile of pistachios being selected
at random from among K varieties, each with its own rate of spontaneous combustion. Write a
function SimMixExps to simulate data from this distribution per the attached prototype. Assume,
w.l.o.g. Oi are in increasing order.
[20pt]
3) Develop EM for inferring {Oi} and {Si} from data.
a) Define the hidden variables, mixture proportions, responsibilities. Write down the log likelihood,
and the expected log likelihood. Develop the update equations for each E-step and M-step.
[15 pt]
b) Implement (a) in EMExps per the attached prototype
[15pt]
c) Choose particular {Oi} and {Si} values, for which you will benchmark the performance of EM as
a (plotted) function of N. Measure performance in two ways: root-sum-of-squared-differences for
{Oi} and root-sum-of-squared-differences {Si}. Choose a range for N that would take you from
poor to great performance. This will depend on the values you choose. Submit the MatLab figures
for both (PlotRMSDLambda.fig, PlotRMSDPi.fig ) and the code to do this: a script
MakePlotsRMSD.m
[15pt]
Good luck!