Description

5/5 - (3 votes)

1. [4 points] Adapted from Exercise 5.4 of FCMA p.204:
Compute the maximum likelihood estimates of qmc for class c of a Bayesian classifier with multinomial
class-conditionals and a set of Nc, M-dimensional objects belonging to class c: x1, …, xNc
.
Solution.
2. [4 points] Adapted from Exercise 5.5 of FCMA p.204:
For a Bayesian classifier with multinomial class-conditionals with M-dimensional parameters qc, compute the posterior Dirichlet for class c when the prior over qc is a Dirichlet with constant parameter α
and the observations belonging to class c are the Nc observations x1, …, xNc
.
Solution.
3. [4 points] Adapted from Exercise 6.1 of FCMA p.234:
Derive the EM update for the variance of the dth dimension and the kth component, σ
2
kd, when the
cluster components have a diagonal Gaussian Likelihood:
p(xn|znk = 1, µk1, …, µKD, σ2
k1
, …, σ2
kD) = Y
D
d=1
N (µkd, σ2
kd)
Solution.
4. [6 points; Required only for Graduates] Adapted from Exercise 6.6 of FCMA p.235:
Derive an EM algorithm for fitting a mixture of Poisson distributions. Assume you observe N integer
counts, x1, …, xN . The likelihood is:
p(X|∆) = Y
N
n=1
X
K
k=1
πk
λ
xn
k
exp {−λk}
xn!
Solution.
5. [2 points]
For a support vector machine, if we remove one of the support vectors from the training set, does the
size of the maximum margin decrease, stay the same, or increase for that dataset? Why? Also justify
your answer by providing a simple, hand-designed dataset (no more than 2-D) in which you identify
the support vectors, draw the location of the maximum margin hyperplane, remove one of the support
vectors, and draw the location of the resulting maximum margin hyperplane. You do not have to run
any code – this can be done completely by hand and drawn schematically.
Solution.
6. [3 points]
Consider the 2-bit XOR problem for which the entire instance space is as follows: In each row, x1
t x1 x2
−1 −1 −1
+1 −1 1
+1 1 −1
−1 1 1
and x2 are the coordinates and t is the class for the point. These instances are not linearly separable,
2
but they are separable with a polynomial kernel. Recall that the polynomial kernel is of the form
κ(xi
, xj ) = (xi>xj + c)
d where c and d are integers. Select (by hand!) values for c and d that yield
a space in which the instances above are linearly separable. Write down the mapping Φ to which this
kernel corresponds, write down Φ(x) for each instance above, and write down the parameters of a
hyperplane in the expanded space that perfectly classifies the instances (there are a range of possible
hyperplanes, just pick one set of hyperplane parameters that satisfies separating the points – you are
not maximizing the margin here, just coming up with one possible separating hyperplane). Again, this
can be done without writing code or deriving an analytic solution!
Solution.
7. [2 points]
Why does the kernel trick allow us to solve SVMs with high dimensional feature spaces without
significantly increasing the running time?
Solution.
8. [5 points] In this course we introduced the Metropolis-Hastings algorithm. Provide the following:
(a) describe the problem it is designed to solve, including how it solves this problem by avoiding a
potentially intractable problem; (b) describe the basic procedure; (c) describe the role of the proposal
distribution.
3

ISTA 421/521 – Final Take-home Assignment

Description

Related products

ISTA 421/521 – Homework 5

ISTA 421/521 – Homework 1

ISTA 421/521 – Homework 3