# Statistical Learning-Classification STAT 441 / 841 Assignment 1

\$30.00

## Description

1. The Matlab data file 0 1 2.mat 1
contains 300 handwritten 0’s, 1’s and 2’s images (one
hundred each) scanned from postal envelopes, like the ones shown below.
Figure 1:
These images are stored as a 64 × 300 matrix. Each column of the matrix is an 8 × 8
greyscale image (the pixel intensities are between 0 and 1). Figure 2 illustrates the two
most significant dimensions found by PCA.
(a) Reproduce Figure 2. You need to find the low-dimensional mapping YP CA by
PCA, and then call the function plotimages provided in the course web page.
You need to implement PCA yourself. You may use svd , eig, and eigs functions
in Matlab but you cannot use Matlab built-in functions pca, and princomp.
1
.mat file can be imported in Python with scipy:
1
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5
−3
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
x
y
Figure 2: A canonical dimensionality reduction problem from visual perception. The input
consists of a sequence of 64-dimensional vectors, representing the brightness values of 8 pixel
by 8 pixel images of digits 0, 1 and 2. Applied to n = 300 raw images. A two-dimensional
projection is shown, with the original input images.
(b) Produce a figure similar to Figure 2, but this time use FDA to map the data in
2-dimensional space.
(c) Use the two-dimensional data YP CA as the covariate. Use LDA and QDA to
compute a linear and a quadratic decision boundary and report the analytic form
of the boundaries.
(d) Plot the decision boundaries on the figure that you produced in part (a)
(e) Implement LDA as explained in class (Based on the Euclidian distance between
points and the mean of each class). Report the error rates when LDA is applied
to this data set.
(f) Can we implement QDA based on the Euclidian distance between points and the
mean of each class? If your answer is yes explain how. Does this identical to the
QDA that you computed in c
2. Prove that if X | Y = 0 ∼ N(µ0, Σ0) and X | Y = 1 ∼ N(µ1, Σ1), then the Bayes rule
is
2
h

(x) = (
1 if r2
1 < r2
0 + 2 log 
π1
π0

+ log 
|Σ0|
|Σ1|

0 otherwise
where
r
2
i = (x − µi)

−1
i
(x − µi), i = 0, 1
and |A| denotes the determinant of a matrix A.
3. Consider a classifier with class conditional densities of the form N(x|µc,
P
c
). In LDA,
we assume P
c =
P and in QDA, each P
c
is arbitrary. Here we consider the 2 class
case in which P
1 = k
P
2
, for k > 1. That is, the Gaussian ellipsoids have the same
shape, but the one for class 1 is wider. Derive an expression for the decision boundary.
4. Suppose we have features x ∈ Rd
, a two-class response, with class sizes n1, n2, and the
target coded as −n/n1, n/n2.
a) Show that the LDA rule classifies to class 2 if
x
TΣˆ −1
( ˆµ2 − µˆ1) >
1
2
µˆ2
TΣˆ −1µˆ2 −
1
2
µˆ1
TΣˆ −1µˆ1 + log(
n1
n
) − log(
n2
n
),
and class 1 otherwise.
b) Consider minimization of the least squares criterion
Xn
i=1
(yi − β0 − β
T xi)
2
.
show that the solution βˆ satisfies
[(n − 2)Σ +ˆ
n1n2
n
Σˆ
B]β = n( ˆµ2 − µˆ1)
where Σˆ
B = ( ˆµ2 − µˆ1)( ˆµ2 − µˆ1)
T
.
c) Show that Σˆ
Bβ is in the direction ( ˆµ2 − µˆ1) and thus
βˆ ∝ Σˆ −1
( ˆµ2 − µˆ1).
Therefore the least squares regression coefficient is identical to the LDA coefficient,
up to a scalar multiple.
3
5. The true error rate of a classifier h is
L(h) = P({h(X) 6= Y })
Consider the special case where Y ∈ Y = {0, 1}. Let
r(x) = P(Y = 1 | X = x)
In this case the Bayes classification rule h

is
h

(x) = 
1 if r(x) >
1
2
0 otherwise.
Prove that the Bayes classification rule is optimal, that is, if h is any other classification
rule then L(h

) ≤ L(h)
4