Description
1 Implementation: GAN (30 pts)
In this part, you are expected to implement GAN with MNIST dataset. We have provided a base jupyter notebook
(gan-base.ipynb) for you to start with, which provides a model setup and training configurations to train GAN
with MNIST dataset.
(a) Implement training loop and report learning curves and generated images in epoch 1, 50, 100. Note that
drawing learning curves and visualization of images are already implemented in provided jupyter notebook.
(15 pts)
Procedure 1 Training GAN, modified from ?
Input: m: real data batch size, nz: fake data batch size
Output: Discriminator D, Generator G
for number of training iterations do
# Training discriminator
Sample minibatch of nz noise samples {z
(1), z(2)
, · · · , z(nz)} from noise prior pg(z)
Sample minibatch of {x
(1), x(2)
, · · · , x(m)}
Update the discriminator by ascending its stochastic gradient:
∇θd
1
m
Xm
i=1
log D(x
(i)
) +
1
nz
Xnz
i=1
log(1 − D(G(z
(i)
)))
# Training generator
Sample minibatch of nz noise samples {z
(1), z(2)
, · · · , z(nz)} from noise prior pg(z)
Update the generator by ascending its stochastic gradient:
∇θg
1
nz
Xnz
i=1
log D(G(z
(i)
))
end for
# The gradient-based updates can use any standard gradient-based learning rule. In the base code, we are using
Adam optimizer (?)
Expected results are as follows.
Figure 1: Learning curve
(a) epoch 1 (b) epoch 50 (c) epoch 100
Figure 2: Generated images by G
Solution goes here. Attach your learning curve and images.
(b) Replace the generator update rule as the original one in the slide,
“Update the generator by descending its stochastic gradient:
∇θg
1
nz
Xnz
i=1
log(1 − D(G(z
(i)
)))
” , and report learning curves and generated images in epoch 1, 50, 100. Compare the result with (a).
Note
that it may not work. If training does not work, explain why it doesn’t work. (10 pts)
Solution goes here. Attach your learning curve and images.
(c) Except the method that we used in (a), how can we improve training for GAN? Implement that and report
your setup, learning curves, and generated images in epoch 1, 50, 100. (5 pts)
Solution goes here. Attach your learning curve and images.
2 Review change of variables in probability density functions [25 pts]
In Flow based generative model, we have seen pθ(x) = p(fθ(x))|
∂fθ(x)
∂x |. As a hands-on (fixed parameter)
example, consider the following setting.
Let X and Y be independent, standard normal random variables. Consider the transformation U = X + Y and
V = X − Y . In the notation used above, U = g1(X, Y ) where g1(X, Y ) where g1(x, y) = x + y and V =
g2(X, Y ) where g2(x, y) = x − y. The joint pdf of X and Y is fX,Y = (2π)
−1
exp(−x
2/2)exp(−y
2/2), −∞ <
x < ∞, −∞ < y < ∞. Then, we can determine u, v values by x, y, i.e.
u
v
=
1 1
1 −1
x
y
.
(a) Compute Jacobian matrix
J =
∂x
∂u
∂x
∂v
∂y
∂u
∂y
∂v
(5 pts)
Solution goes here.
(b) (Forward) Show that the joint pdf of U, V is
fU,V (u, v) = 1
√
2π
√
2
exp(−u
2
/4) 1
√
2π
√
2
exp(−v
2
/4)
(10 pts)
(Hint: fU,V (u, v) = fX,Y (?, ?)|det(J)|)
Solution goes here.
(c) (Inverse) Check whether the following equation holds or not.
fX,Y (x, y) = fU,V (x + y, x − y)|det(J)
−1
|
(10 pts)
Solution goes here.
3 Directed Graphical Model [20 points]
Consider the directed graphical model (aka Bayesian network) in Figure ??.
Figure 3: A Bayesian Network example.
Compute P(B = t | E = f, J = t, M = t) and P(B = t | E = t, J = t, M = t). (10 points for each) These are
the conditional probabilities of a burglar in your house (yikes!) when both of your neighbors John and Mary call
you and say they hear an alarm in your house, but without or with an earthquake also going on in that area (what
a busy day), respectively.
Solution goes here.
4 Chow-Liu Algorithm [25 pts]
Suppose we wish to construct a directed graphical model for 3 features X, Y , and Z using the Chow-Liu algorithm.
We are given data from 100 independent experiments where each feature is binary and takes value T or F.
Below
is a table summarizing the observations of the experiment:
X Y Z Count
T T T 36
T T F 4
T F T 2
T F F 8
F T T 9
F T F 1
F F T 8
F F F 32
1. Compute the mutual information I(X, Y ) based on the frequencies observed in the data. (5 pts)
2. Compute the mutual information I(X, Z) based on the frequencies observed in the data. (5 pts)
3. Compute the mutual information I(Z, Y ) based on the frequencies observed in the data. (5 pts)
4. Which undirected edges will be selected by the Chow-Liu algorithm as the maximum spanning tree? (5 pts)
5. Root your tree at node X, assign directions to the selected edges. (5 pts)