Description
1. Problem 2.6.26 in Duda, Hart, and Stork (DHS).
2. In this problem we will consider the ML estimate of the parameters of a multinomial distribution.
Consider a random variable X such that PX(k) = πk, k ∈ {1,…,N}. Suppose we draw n independent
observations from X and form a random vector C = (C1,…,CN )T where Ck is the number of times
that the observed value is k (i.e. C is the histogram of the sample of observations). Then, C has
multinomial distribution
PC1,…,CN (c1,…,cN ) = n!
N
k=1 ck!
N
j=1
πcj
j .
a) Derive the ML estimator for the parameters πi, i = 1,…,N. (Hint: notice that these parameters
are probabilities, which makes this an optimization problem with a constraint. If you know about
Lagrange multipliers feel free to use them. Otherwise, note that minimizing a function f(a, b) under
the constraint a + b = 1 is the same as minimizing the function f(a, 1 − a)).
b) Is the estimator derived in a) unbiased? What is its variance? Is this a good estimator? Why?
3. Problem 3.2.8 in DHS.
4. Problem 3.2.10 in DHS. Assume that the random variables X1,…,Xn are iid with a distribution of
mean μ, which is the quantity to estimate.
5. In this problem we will consider the ML estimate of the Gaussian covariance matrix.
a) Problem 3.4.13 in DHS.
b) Derive the same result by computing derivatives in the usual way. (Hint: you may want to use a manual of matrix calculus such as that at http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html.
Also, it may be easier to work with the precision matrix P = Σ−1.)
6. (computer) This week we will continue trying to classify our cheetah example. Once again we use
the decomposition into 8×8 image blocks, compute the DCT of each block, and zig-zag scan. However,
we are going to assume that the class-conditional densities are multivariate Gaussians of 64 dimensions.
Note: The training examples we used last time contained the absolute value of the DCT coefficients
instead of the coefficients themselves. Please download the file TrainingSamplesDCT 8 new.mat and
use it in this and all future exercises. For simplicity, I will still refer to it as TrainingSamplesDCT 8.mat.
a) Using the training data in TrainingSamplesDCT 8.mat compute the histogram estimate of the prior
PY (i), i ∈ {cheetah, grass}. Using the results of problem 2 compute the maximum likelihood estimate
for the prior probabilities. Compare the result with the estimates that you obtained last week. If they
are the same, interpret what you did last week. If they are different, explain the differences.
1
b) Using the training data in TrainingSamplesDCT 8.mat, compute the maximum likelihood estimates
for the parameters of the class conditional densities PX|Y (x|cheetah) and PX|Y (x|grass) under the
Gaussian assumption. Denoting by X = {X1,…,X64} the vector of DCT coefficients, create 64 plots
with the marginal densities for the two classes – PXk|Y (xk|cheetah) and PXk|Y (xk|grass), k = 1,…, 64
– on each. Use different line styles for each marginal. Select, by visual inspection, what you think are
the best 8 features for classification purposes and what you think are the worst 8 features (you can use
the subplot command to compare several plots at a time). Hand in the plots of the marginal densities
for the best-8 and worst-8 features (once again you can use subplot, this should not require more than
two sheets of paper). In each subplot indicate the feature that it refers to.
c) Compute the Bayesian decision rule and classify the locations of the cheetah image using i) the
64-dimensional Gaussians, and ii) the 8-dimensional Gaussians associated with the best 8 features. For
the two cases, plot the classification masks and compute the probability of error by comparing with
cheetah mask.bmp. Can you explain the results?
2