AIGS/CSED515Assignment 4: Graphical Model solved

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (10 votes)

10 Bayes Networks – 10 points
Figure 2: A graphical model
Consider the Bayesian network in Figure 2. We use (X ?? Y |Z) to denote the fact that X and Y are
independent given Z. Answer the following questions:
1. Are there any pairs of point that are independent? If your answer is yes, please list out all such pairs.
Answer: Yes. (C, E), (C, B), (A, E), (A, B).
2. Does (B ?? C|A, D) hold? Briefly explain.
Answer: It holds. We can see that by using the Bayes ball algorithm.
3. Does (B ?? F|A, D) hold? Briefly explain.
Answer: It does not hold. By using the Bayes ball algorithm, we can find a path B ! E ! G ! H !
F.
4. Assuming that there are d = 10 values that each of these variables can take (say 1 to 10), how many
parameters do we need to model the full joint distribution without using the kowldge incoded in the
graph (i.e. no independence / conditional independence assumptions)? How many parameters do we
need for the Bayesian network for such setting? (you do not need to provide the exact number, a close
approximation or a tight upper / lower bound will do).
Answer: Without graph: 108. With graph: about 10 ⇥ 2 + 102 ⇥ 3 + 103 ⇥ 2 + 104 ⇥ 1 = 12320
13
Figure 1: A Bayesian network
1. [Bayesian Network] Consider the Bayesian network in Figure 1. We write (X ⊥ Y | Z)
if X and Y are (conditionally) independent given Z.
(a) [2pt] List out every pair of (marginally) independent variables.
(b) [2pt] Check if (B ⊥ C | A, D). Justify your answer.
(c) [2pt] Check if (B ⊥ F | A, D). Justify your answer.
(d) [5pt] Assume that each variable can take one of 10 values in {1, 2, 3, …, 10}.
How many parameters do we need to model the full joint distribution without
using the knowledge encoded in the graph? Write the joint distribution with
the factorization corresponding to the Bayesian network in Figure 1. How many
parameters do we need now (it is okay to provide a rough computation.)? Provide
your answers with brief justification.
2
2. [Belief Propagation] Consider a joint distribution of five binary random variables
X1, …, X5 ∈ {0, 1} in the following factor form:
p(X) = p(X1, X2, X3, X4, X5) = 1
Z
fa(X1, X2)fb(X2, X3)fc(X2, X4)fd(X4, X5) ,
where Z is the normalization term to make the sum of p(X) over all possible X equal
to 1.
(a) [2pt] Draw the corresponding factor graph.
(b) [3pt] Perform the sum-product belief propagation algorithm to compute p(X4).
(c) [4pt] Assuming fu(x, y) = x + y for every u ∈ {a, b, c, d}, and using the maxsum BP algorithm, find (i) the optimal configuration X∗ maximizing the joint
distribution, and (ii) the maximum of the joint distribution.
3
3. [Gaussian Mixture Models & Expectation-Maximization] Consider a Gaussian mixture
model (GMM) with K components, where each component k ∈ [K] := {1, 2, …, K}
has mean µk, variance σ
2
k
, and mixing parameter πk. We want to learn all these
are parameters, denoted by θ = (µk, σk, πk)k∈[K]
, given a dataset X = {xi}i∈[n]
, where
xi ∈ R. We denote by Z = {zi}i∈[n] the latent variables indicating association of xi
, i.e.,
zi = k implies that xi
is generated from the k-th Gaussian distribution. In the following
questions, for simplicity, you may use N (xi
| µk, σk) := √
1
2πσ2
k
exp 

(xi−µk)
2

2
k

.
(a) [2pt] Write the log-likelihood of the data, log p(X | θ), and the complete-data
log-likelihood of the data, log p(X, Z | θ), according to the GMM.
(b) [2pt] We will use EM algorithm to learn θ, where we need the conditional distribution of the latent variables Z given the current estimation of the parameter
θ
(t) = (µ
(t)
k
, σ
(t)
k
, π
(t)
k
)k∈[K] at t-th iteration of EM algorithm. Write the posterior
probability p(zi = k | xi
, θ(t)
).
(c) [2pt] Write Ezi|xi,θ(t) [log p(xi
, zi
| θ)], where for simplicity, you may denote by
z
(t)
ik := p(zi = k | xi
, θ(t)
) the answer that you obtain in Problem 3b
(d) [5pt] The above questions correspond to E-step. We now derive M-step to compute θ
(t+1) from θ
(t) and z
(t)
ik ’s, where
θ
(t+1) = (µ
(t+1)
k
, σ
(t+1)
k
, π
(t+1)
k
)k∈[K] = arg max
θ
X
i∈[n]
Ezi|xi,θ(t) [log p(xi
, zi
| θ)] .
Recalling P
k∈[K]
πk = 1, obtain µ
(t+1)
k
, σ
(t+1)
k
and π
(t+1)
k
using your answer to the
previous questions.
(e) [2pt] Describe the conditions to correspond (hard) k-means and EM algorithm
for the GMM.
(f) [0pt] Have fun with the attached codes, kmeans.py (an implementation of kmeans algorithm) and GMM.py (an implementation of EM for GMM).
4