Assignment 8 Machine Learning COMS 4771


Category: You will Instantly receive a download link for .zip solution file upon Payment


5/5 - (3 votes)

1) This question pertains to the variables X and y_int in the midterm data file data.mat. Apologies
to those of you who have already done this or similar analysis during the midterm. Consider the entire
set of N=10000 input datapoints (rows of X) as well as the two classes of datapoints (rows of X
corresponding to the different values of y_int) hereby defined X1 and X2. Use principal
component analysis for eigenvector decomposition of each of these three sets. Create three scatterplots, each in 3D, and each of all N datapoints, color coded by class. The first scatter-plot will be
along the coordinate systems defined by the top principal components of X. The second and third
scatter-plots will be along the coordinate systems defined by the top principal components of X1 and
X2. Submit three MatLab figure files for the three plots (scatterX.fig, scatterX1.fig,
2) Consider the exponential distribution defined in Assignment 7 Problem 2. A mixture of exponentials
is a random variable whose distribution has a parameter O chosen at random among {Oi} , i=1,..,K
with respective prior probabilities {Si} . This would model, e.g., your pile of pistachios being selected
at random from among K varieties, each with its own rate of spontaneous combustion. Write a
function SimMixExps to simulate data from this distribution per the attached prototype. Assume,
w.l.o.g. Oi are in increasing order.
3) Develop EM for inferring {Oi} and {Si} from data.
a) Define the hidden variables, mixture proportions, responsibilities. Write down the log likelihood,
and the expected log likelihood. Develop the update equations for each E-step and M-step.
[15 pt]
b) Implement (a) in EMExps per the attached prototype
c) Choose particular {Oi} and {Si} values, for which you will benchmark the performance of EM as
a (plotted) function of N. Measure performance in two ways: root-sum-of-squared-differences for
{Oi} and root-sum-of-squared-differences {Si}. Choose a range for N that would take you from
poor to great performance. This will depend on the values you choose. Submit the MatLab figures
for both (PlotRMSDLambda.fig, PlotRMSDPi.fig ) and the code to do this: a script
Good luck!