Question 1 (50%)
Conduct the following model order selection exercise using 10-fold cross-validation procedure
and report your procedure and results in a comprehensive, convincing, and rigorous fashion:
1. Select a Gaussian Mixture Model as the true probability density function for 2-dimensional
real-valued data synthesis. This GMM will have 4 components with different mean vectors,
different covariance matrices, and different probability for each Gaussian to be selected as
the generator for each sample. Specify the true GMM that generates data.
2. Generate multiple data sets with independent identically distributed samples using this true
GMM; these datasets will have respectively 10, 100, 1000, 10000 samples.
3. For each data set, using maximum likelihood parameter estimation principles (e.g. with the
EM algorithm), within the framework of K(=10)-fold cross-validation, evaluate GMMs with
different model orders; specifically evaluate candidate GMMs with 1, 2, 3, 4, 5, 6 Gaussian components. Note that both model parameter estimation and validation performance
measures to be used is log-likelihood of data.
4. Report your results for the experiment, indicating which of the six GMM orders get selected
for each of the datasets you produced. Develop a good way to describe and summarize your
experiment results in the form of tables/figures.
Question 2 (50%)
Conduct the following maximum likelihood discriminative classifier training exercise on data
generated from two Gaussian distributed classes:
1. Generate training set with 999 2-dimensional samples from two classes with priors q− = 0.3
and q+ = 0.7; the class-conditional data probability distributions are two Gaussians with
different mean vectors and different covariance matrices (choose the matrices to be nondiagonal with distinct eigenvalues, so your Gaussian pdfs are tilted with respect to each other
and elongated in different directions by different aspect ratios). Hint: For more interesting
results, make the Gaussians overlap with each other somewhat significantly, so that the
minimum error probability achievable is not too small.
2. Using Fisher LDA, identify a linear classifier that minimizes the error count on the training
set. This classifier will have a discriminant function of the form w
LDAx + bLDA where x is
the data vector, and the classifier decides in favor of Class − if the discriminant is below 0,
and decides in favor of Class +, if the discriminant is at least 0.
3. Train the parameters of a logistic function y(x) = 1/(1 + e
) using the maximum
likelihood estimation principle to optimize the parameters w and b with the training set,
such that the function is trained to act as a surrogate for the posterior probability of Class
+ given x. In particular, your model assumes that y(x) ≈ P(Label = +|x); consequently,
1−y(x) ≈ P(Label = −|x). Hint: Once you specify the optimization objective to train this
logistic-linear-model for class posterior, you can solve the optimization problem using any
suitable numerical optimization procedure, such as gradient ascent that you implement from
scratch, or using a derivative free numerical optimization procedure like the Nelder-Mead
Simplex Reflection Algorithm (e.g. in Matlab, fminsearch). Make a choice, implement correctly, perhaps consider using the LDA solution you developed earlier to provide an initial
estimate for the model parameters.
4. Report visual and numerical results that compare the following three classifiers (e.g. data
scatter plots with color/shape indicators of true/decided labels), including the error counts
each classifier achieve on the training set: MAP-classifier that makes use of the true data
distributions and class priors, which achieves minimum probability error by design; LDA
classifier you designed earlier; logistic-linear classifier you designed next.