$30.00

Category: CSE 891

Description

5/5 - (4 votes)

1 Reversible Architectures [3pts]: In this section, we will investigate a variant for implementing reversible block with affine coupling layers. Consider the following reversible affline coupling block:

y1 = exp(G(x2)) ◦ x1 + F(x2)

y2 = exp(s) ◦ x2

(1)

where ◦ denotes element-wise multiplication. The each inputs x1, x2 ∈ R

d

2 . The functions F and G maps

from R

d

2 → R

d

2 . This modified block is identical to the ordinary reversible block, except that the inputs

x1 and x2 are multiplied element-wise by vectors exp(F(x2)) and exp(s).

1. (1pt) Give the equations for inverting this block, i.e. computing x1 and x2 from y2 and y2. You may

use / to denote element-wise division.

2. (1pt) Give a formula for the Jacobian ∂y

∂x

, where y denotes the concatenation of y1 and y2 . You may

denote the solution as a block matrix, as long as you clearly define what the matrix for each block

corresponds to.

3. (1pt) Give a formula for the determinant of the Jacobian from previous part, i.e. compute det

∂y

∂x

.

Is this a volume preserving transformation? Justify your answer.

1

2 Variational Free Energy [6pts]: In this question you will derive some expressions related to variational

free energy which is maximized to train a VAE. Recall that the VFE is defined as:

F(q) = Eq[log p(x|z)] − DKL(q(z)||p(z))

where KL divergence is defined as

DKL(q(z)||p(z)) = Eq[log q(z) − log p(z)]

We will assume that the prior z is a standard Gaussian:

p(z) = N (z; 0, I) =

D

∏

i=1

pi(zi) =

D

∏

i=1

N (zi

; 0, 1)

Similarly we will assume that the variational approximation q(z) is a fully factorized (i.e., diagonal) Gaussian:

q(z) = N (z; µ, Σ) =

D

∏

i=1

qi(zi) =

D

∏

i=1

N (zi

; µi

, σi)

1. (1pt) Show that:

F(q) = log p(x) − DKL(q(z)||p(z|x))

2. (1pt) Show that the KL term decomposes as a sum of KL terms for individual dimensions. In particular,

DKL(q(z)||p(z)) = ∑

i

DKL(qi(zi)||pi(zi))

3. (2pts) Give an explicit formula for the KL divergence DKL(qi(zi)||pi(zi)). This should be a mathematical expression involving µi and σi

.

4. (2pts) One way to do gradient descent on the KL term is to apply the formula from above. Another

approach is to compute stochastic gradients using the reparameterization trick:

∇θDKL(qi(zi)||pi(zi)) = Ee[∇θti

]

, where

θ =

µi

σi

and

zi = µi + σiei

ri = log qi(zi)

si = log pi(zi)

ti = ri − si

(2)

Show how to compute a stochastic estimate of ∇θDKL(qi(zi)||pi(zi)) by doing backpropagation on

the above equations. You may find it helpful to draw the computation graph.

2

3 Feedback (1pt):

1. What aspects of the written and programming homeworks did you enjoy for this course?

2. What aspects of the written and programming homeworks did you hate for this course?

3. Suggestions for what you would like to modify in the homeworks.

4. Suggestions for course content/lecture slides and topics.

3

WhatsApp us