Description
1. (20 points) Let X = {x1, . . . , xn} be a set of n samples drawn i.i.d. from an univariate
distribution with density function p(x|θ), where θ is an unknown parameter. In general, θ
will belong to a specified subset of R, the set of real numbers. For the following choices of
p(x|θ), derive the maxmimum likelihood estimate of θ based on the samples X :
1
(a) (5 points) p(x|θ) = √
1
2πθ exp
−
x
2
2θ
2
, θ > 0.
(b) (5 points) p(x|θ) = 1
θ
exp
−
x
θ
, 0 ≤ x < ∞, θ > 0.
(c) (5 points) p(x|θ) = θxθ−1
, 0 ≤ x ≤ 1, 0 < θ < ∞.
(d) (5 points) p(x|θ) = 1
θ
, 0 ≤ x ≤ θ, θ > 0.
2. (20 points) Let X = {x1, . . . , xn}, xi ∈ R
d be a set of n samples drawn i.i.d. from a multivariate
Gaussian distribution in R
d with mean µ ∈ R
d and covariance matrix Σ ∈ R
d×d
. Recall that
the density function of a multivariate Gaussian distribution is given by:
p(x|µ, Σ) = 1
(2π)
d/2|Σ|
1/2
exp
−
1
2
(x − µ)
T Σ
−1
(x − µ)
.
(a) (10 points) Derive the maximum likelihood estimates for the mean µ and covariance Σ
based on the sample set X .
1,2
(b) (5 points) Let ˆµn be the maximum likelihood estimate of the mean. Is ˆµn a biased
estimate of the true mean µ? Clearly justify your answer by computing E[ˆµn].
(c) (5 points) Let Σˆ
n be the maximum likelihood estimate of the covariance matrix. Is Σˆ
n
a biased estimate of the true covariance Σ? Clearly justify your answer by computing
E[Σˆ
n].
3. (10 points) Table 1 specifies the misclassification costs for a 3-class problem including a
‘Reject’ option. Assume that a model has been trained using training data, and the model
can output posterior probabilities P(C1|xtest), P(C2|xtest), P(C3|xtest) for any given test point
xtest.
(a) (5 points) Assume λ = 10. For a given xtest, let the posterior probabilities for the three
classes be: P(C1|xtest) = 0.5, P(C2|xtest) = 0.25, P(C3|xtest) = 0.25. Using Table 1,
compute the risks for predicting x to be C1, C2, C3, and ‘Reject’ respectively. Including
‘Reject’ as a possible option, what would your predicted class for xtest be? You have to
show details of your computation and justify your answer.
1You have to show the details of your derivation. A correct answer without the details will not get any credit.
2You can use material from the Matrix Cookbook and/or the textbook for your derivation.
Predicted Class
C1 C2 C3 ‘Reject’
True Class
C1 0 1 1 λ
C2 10 0 10 λ
C3 100 100 0 λ
Table 1: Misclassification costs for a 3-class problem including a ‘Reject’ option.
(b) (5 points) Assume λ = 5. For a given xtest, let the posterior probabilities for the
three classes be: P(C1|xtest) = 0.4, P(C2|xtest) = 0.5, P(C3|xtest) = 0.1. Using Table 1,
compute the risks for predicting x to be C1, C2, C3, and ‘Reject’ respectively. Including
‘Reject’ as a possible option, what would your predicted class for xtest be? You have to
show details of your computation and justify your answer.
Programming assignment:
The next problem involves programming. For Question 3, we will be using the 2-class classification datasets from Boston50, Boston75, and the 10-class classification dataset from Digits which
were used in Homework 1.
3. (50 points) We will develop two parametric classifiers by modeling each class’s conditional
distribution p(x|Ci) as multivariate Gaussians with (a) full covariance matrix Σi and (b)
diagonal covariance matrix Σi
. In particular, using the training data, we will compute the
maximum likelihood estimate of the class prior probabilities p(Ci) and the class conditional
probabilities p(x|Ci) based on the maximum likelihood estimates of the mean ˆµi and the
(full/diagonal) covariance Σˆ
i for each class Ci
. The classification will be done based on the
following discriminant function:
gi(x) = log p(Ci) + log p(x|Ci) .
We will develop code for a class MultiGaussClassify with two key functions:
MultiGaussClassify.fit(self,X,y,diag) and MultiGaussClassify.predict(self,X).
For fit(self,X,y,diag), the inputs (X, y) are respectively the feature matrix and class labels, and diag is boolean (TRUE or FALSE) which indicates whether the estimated class covariance matrices should be a full matrix (diag=FALSE) or a diagonal matrix (diag=TRUE).
For predict(X), the input X is the feature matrix corresponding to the test set and the
output should be the predicted labels for each point in the test set.
For the class, the init (self,k,d) function can initialize the parameters for each class to
be uniform prior, zero mean, and identity covariance, i.e., p(Ci) = 1/k, µi = 0 and Σi = I,
i = 1, . . . , k. Here, the number of classes k and the dimensionality d of features is passed as
an argument to the constructor of MultiGaussClassify.
We will compare the performance of three models:
(i) MultiGaussClassify with full class covariance matrices,
(ii) MultiGaussClassify with diagonal covariance matrices, and
2
(iii) LogisticRegression3
applied to three datasets: Boston50, Boston75, and Digits. Using my cross val with 5-fold
cross-validation, report the error rates in each fold as well as the mean and standard deviation
of error rates across folds for the three models applied to the three classification datasets
You will have to submit (a) code and (b) summary of results:
(a) Code: You will have to submit code for MultiGaussClassify as well as a wrapper code
hw2q3(). For the class, please use the following template:
class MultiGaussClassify:
def init (self, k, d):
…
def fit(self, X, y, diag=False):
…
def predict(self, X):
…
Your class MultiGaussClassify should not inherit any base class in sklearn. Again,
the three functions you must implement in the MultiGaussClassify class are init ,
fit, and predict.
The wrapper code hw2q3() (main file) has no input and is used to prepare the datasets,
and make calls to my cross val(method,X,y,k) to generate the error rate results for each
dataset and each method. The code for my cross val(method,X,y,k) must be yours
(e.g., code you developed in HW1 with modifications as needed) and you cannot use
cross val score() in sklearn. For the method argument in my cross val, you can
call the method corresponding to MultiGaussClassify with full covariance matrix as
just ‘multigaussclassify’ and the method corresponding to MultiGaussClassify with
diagonal covariance matrix as ‘multigaussdiagclassify.’
The results should be printed to terminal (not generating an additional file in the folder).
Make sure the calls to my cross val(method,X,y,k) are made in the following order and
add a print to the terminal before each call to show which method and dataset is being
used:
1. MultiGaussClassify with full covariance matrix on Boston50,
2. MultiGaussClassify with full covariance matrix on Boston75,
3. MultiGaussClassify with full covariance matrix on Digits,
4. MultiGaussClassify with diagonal covariance matrix on Boston50,
5. MultiGaussClassify with diagonal covariance matrix on Boston75,
6. MultiGaussClassify with diagonal covariance matrix on Digits,
7. LogisticRegression with Boston50,
8. LogisticRegression with Boston75, and
9. LogisticRegression with Digits.
3You should use LogisticRegression from scikit-learn, similar to HW1.
3
For example, the first call to my cross val(method,X,y,k) should result in the following
output:
Error rates for MultiGaussClassify with full covariance matrix on Boston50:
Fold 1: ###
Fold 2: ###
…
Fold 5: ###
Mean: ###
Standard Deviation: ###
(b) Summary of results: For each dataset and each method, report the test set error rates
for each of the k = 5 folds, the mean error rate over the k folds, and the standard deviation
of the error rates over the k folds. Make a table to present the results for each method
and each dataset (9 tables in total). Each column of the table represents a fold, and add
two columns at the end to show the overall mean error rate and standard deviation over
the k folds. For example:
Error rates for MGC with full cov matrix on Boston50
Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Mean SD
# # # # # # #
Additional instructions: Code can only be written in Python (not IPython notebook); no other
programming languages will be accepted. One should be able to execute all programs directly
from command prompt (e.g., “python3 hw2q3.py”) without the need to run Python interactive
shell first. Test your code yourself before submission and suppress any warning messages that may
be printed. Your code must be run on a CSE lab machine (e.g., csel-kh1260-01.cselabs.umn.edu).
Please make sure you specify the version of Python you are using as well as instructions on how to
run your program in the README file (must be readable through a text editor such as Notepad).
Information on the size of the datasets, including number of data points and dimensionality of
features, as well as number of classes can be readily extracted from the datasets in scikit-learn.
Each function must take the inputs in the order specified in the problem and display the output
via the terminal or as specified.
For each part, you can submit additional files/functions (as needed) which will be used by the
main file. Please put comments in your code so that one can follow the key parts and steps in your
code.
Follow the rules strictly. If we cannot run your code, you will not get any credit.
• Things to submit
1. hw2.pdf: A document which contains the solution to Problems 1, 2, and 3 including the
summary of results for 3. This document must be in PDF format (no word, photo, etc.
is accepted). If you submit a scanned copy of a hand-written document, make sure the
copy is clearly readable, otherwise no credit may be given.
2. Python code for Problem 3 (must include the required hw2q3.py).
3. README.txt: README file that contains your name, student ID, email, instructions
on how to run your code, the full Python version your are using, any assumptions you
are making, and any other necessary details. The file must be readable by a text editor
such as Notepad.
4
4. Any other files, except the data, which are necessary for your code.
5