Sale!

CS 4641 – Homework 2

$35.00 $21.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (8 votes)

1 Maximum Likelihood Estimation
Suppose we have N i.i.d (independent and identically distributed) data samples, i.e. X = {x
n}
N
n=1 from the
following probability distributions. This problem asks you to build a log-likelihood function for each
distribution (i.e. log P(X)), and find the maximum likelihood estimator of the parameter(s).
(a) Poisson distribution
The Poisson distribution is defined as
P(x = k) = λ
k
e
−λ
k!
(k = 0, 1, 2, …).
What is the maximum likelihood estimator of λ?
(b) Exponential distribution
The probability density function of Exponential distribution is given by
f(x) = 
λe−λx x ≥ 0
0 x < 0 What is the maximum likelihood estimator of λ? 1 (c) Gaussian normal distribution A univariate Gaussian normal distribution, N (µ, σ2 ), is given by N (x; µ, σ2 ) = 1 σ √ 2π exp  − (x − µ) 2 2σ 2  . What is the maximum likelihood estimator of µ and σ 2 ? 2 Chance of Karl Die ... No, this isn’t a sinister Yoda-ish statement regarding my state of health. It refers to a type of dice famously (in my mind) named after me (by me). The dice is such that the value of the current roll depends on the value of the previous roll. More on that in a second - let’s set up first. Let x (n) be the n th sequence of M dice rolls. We represent x (n) as a vector of individual rolls, i.e. x (n) = (x (n) 1 , x (n) 2 , . . . , x (n) M ). Each Karl dice roll can take up to 12 values (that’s my Messiah complex kicking in again), i.e. x (n) i ∈ {1, . . . , 12}. Every sequence of rolls is independent of the other, but the rolls within a sequence are dependent on each other in the following way: first roll does not depend on any other roll, second roll depends on the first, third depends on the second, etc.... We also have a Regular 12-sided dice in which individual rolls are completely independent from each other. Every sequence of rolls uses a single dice (i.e. rolls within a sequence must be performed by the same dice). For the n th sequence, we have a label y (n) such that y (n) =  1 if Karl dice is used 0 if Regular dice is used We collect all the data in X and y variables which collectively constitute our dataset D, i.e. D = (X, Y ) = {(x (n) , y(n) )} N n=1. Our game is simple: given a sequence of rolls, how can we identify which type of dice was used to generate the roll? We will solve this problem by estimating the parameters that maximize the likelihood of our data. (a) Show that the likelihood function can be written as p(D|h) = Y N n=1 h π1K(x (n) ) iy (n) h π0R(x (n) ) i1−y (n) where πi = p(y = i) K(x (n) ) = p(x (n) |y (n) = 1) R(x (n) ) = p(x (n) |y (n) = 0) 2 Note: if, during your derivation, you feel a step needs explanation, please feel free to describe it in English (as opposed to?) as well. (b) Show that the log-likelihood can be written as log p(D|h) =X N n=1 y (n) " π1 + log K(x (n) 1 ) +X M i=2 log K(x (n) i |x (n) i−1 ) # + X N n=1 (1 − y (n) ) " log π0 + X M i=1 log R(x (n) i ) # (1) where K(x (n) 1 ) = p(x (n) 1 |y (n) = 1) K(x (n) i |x (n) i−1 ) = p(x (n) i |x (n) i−1 , y(n) = 1) R(x (n) i ) = p(x (n) i |y (n) = 0) (c) Now that we have that done, we can see that we have 4 sets of parameters. Specifically, πi for i = 1, 2 K(v) for v = 1, . . . , 12 K(v|v 0 ) for v = 1, . . . , 12, v0 = 1, . . . , 12 R(v) for v = 1, . . . , 12 Write the optimization constraints for each of the above sets of parameters. (d) Write the Lagrangian form of the maximum (log) likelihood estimation problem. (e) Show that the log-likelihood can be further expanded to the form log p(D|h) =X N n=1 y (n) " π1 + X v I(x (n) 1 = v) log K(x (n) 1 = v) + X M i=2 X v X v 0 I(x (n) i = v, x (n) i−1 = v 0 ) log K(x (n) i = v|x (n) i−1 = v 0 ) # + X N n=1 (1 − y (n) ) " log π0 + X M i=1 X v I(x (n) i = v) log R(x (n) i = v) # where I(a = b) =  1 if a = b 0 otherwise I(a = b, c = d) =  1 if a = b AND c = d 0 otherwise and P v is simply P v∈{1,...,12} (similarly for v 0 ). (f) Show that the MLE for π1 has the form π1 = 1 N X N n=1 y (n) 3 where N is the number of data points. (g) Show that the MLE estimate for K(v) has the form K(v) = 1 N1 count1(v, y = 1) where N1 is the number of data points where y = 1, and count1(v, y = 1) is the number of data points with label y = 1 where feature 1 has the value v. (h) Derive the MLE estimate for K(v|v 0 ). You can’t expect me to give you all the answers... but I’ll be nice - you may (will) find the following measure helpful counti(v, v0 , y(n) = 1) which is simply the number of data points with label y = 1, where the i th feature has value v, and the i − 1 th feature has value v 0 . (i) Derive the MLE estimate for R(v). 4