CSE 891-001: Deep Learning Homework 1

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment

Description

5/5 - (7 votes)

1 Hard-Coding a Multilayer Perceptron (2pts): In this problem you will find a set of weights and biases for a multilayer perceptron which determines if a list of four numbers are sorted in ascending order. More specifically, you receive 4 inputs x1, x2, x3 and x4 where xi ∈ R, and the network must output 1 if x1 < x2 < x3 < x4 and 0 otherwise. You will use the following Multilayer Perceptron (MLP) architecture consisting of one input layer with four nodes, one hidden layer with three nodes and one output layer with one node. x1 x2 x3 x4 h1 h2 h3 y W(1) , b (1) w(2) , b (2) The activation function of the hidden units and the output unit can be assumed to be a hard threshold function. φ(z) = ( 1 if z > 0 0 if z ≤ 0 Find a set of weights and biases for the network which correctly implements this function (including cases where some of the inputs are equal). Your answer should include: 1. A 3 × 4 weight matrix W(1) for the hidden layer. 2. A 3-dimensional vector of biases b (1) for the hidden layer. 3. A 3-dimensional vector of weights w(2) for the output layer. 4. A scalar bias b (2) for the output layer. 1 2 Sparsifying Activation Function (3pts): An interesting property of the ReLU activation function is that it sparsifies the activations and the derivatives, i.e., sets a large fraction of the values to zeros for any given input vector. Consider the following network where the activation function of all the units is a ReLU function. x1 x2 h3 h4 h1 h2 y w3 w2 w1 w5 w4 where each wi refers to the weight of a single connection. Assume we are trying to minimize a loss function L which depends only on the activation of the output unit y. Suppose the unit h1 receives an input -1 on a particular training case, so the ReLU evaluates to 0. Based only on this information, which of the weight derivatives ∂L ∂w1 , ∂L ∂w2 , ∂L ∂w3 are guaranteed to be 0 for this training case? Provide a YES or NO answer for each with your justification. 2 3 Universal Approximation Theorem (5pts): In this problem you will build the intuition behind how the neural network function class can approximate a particular class of functions arbitrarily well. Suppose f : I → R, where I = [a, b] ⊂ R and a ≤ b is a closed interval. Also, let ˆ fτ : I → R be some function approximator from our network where τ is a description of our network architecture and weights. Here, τ is a tuple of (n,W0 ∈ Rn×1 , b0 ∈ Rn ,W1 ∈ Rn×1 , b1 ∈ Rn ), where n is the hidden layer size, W0 and b0 describe the input hidden parameters, and W1 and b1 describe the output hidden parameters. The output is computed as ˆ fτ(x) = W1a(W0x + b0) + b1, where the activation a(·) is an indicator function, i.e., a(y) = I(y ≥ 0), where I(s) is 1 when the boolean value s is true and 0 otherwise. For a vector, the activation function is applied to each element of the vector. We want to show that there exist a series of neural networks {τi} N i=1 such that: ∀e > 0, ∃M : ∀m > M, k f − ˆ fτm } < e where k f − ˆ f k = R I | f(x) − ˆ f(x)|dx. x1 . . . xj . . . xn 1 a(·) . . . a(·) . . . a(·) 1 o(·) 1. (1.0 pt) Consider a rectangular function g(h, a, b, x) = h · I(a ≤ x ≤ b). Given some (h, a, b) show ∃τ : ˆ fτ(x) = g(h, a, b, x). You answer should be a specific choice of n, W0, b0, W1, and b1, which will be functions of the selected (h, a, b), where h ∈ R, a ∈ R, and b ∈ R. 2. (1.5 pt) Given f(x) = −x 2 + 1 where I = [−1, 1] and some initial function ˆ f0(x) = 0 which is identically 0, construct a new function ˆ f1(x) = ˆ f0(x) + g(h1, a1, b1, x) such that k f − ˆ f1k ≤ k f − ˆ f0k, with the rectangle function in the previous question. Note that h1, a1, and b1 are going to depend on your choice of f , ˆ f and I. Plot f and ˆ f1, write down h1, a1, and b1, and justify why k f − ˆ f1k < k f − ˆ f0k. 3. (2.5 pt) Describe a procedure which starts with ˆ f0(x) = 0 and a fixed N, then construct a series { ˆ fi} N i=0 where ˆ fi+1(x) = ˆ fi(x) + g(hi+1 , ai+1 , bi+1 , x), which satisfies k f − ˆ fi+1k < k f − ˆ fik. Use the definition of g from above and the choice of f from the previous question. Plot f , ˆ f1, ˆ f2 and ˆ f3, write down how to generate hi+1 ,ai+1 ,bi+1 , and justify why k f − ˆ fi+1k < k f − ˆ fik. 3