Description

5/5 - (3 votes)

1. [3 points] Exercise 1: Load and visualize MNIST:
You will need to the files display network.py and load MNIST.py into the same directory. Run
the train ml class.py code to load and visualize the MNIST dataset. Currently the code loads the
training images, loads 10K images and visualizes 100.
Modify the loading part in the train ml class.py so you display 10, 50 and 100 subsets of MNIST.
Tip: Use the sys.exit() command to stop execution of the code, since it will break if you have not
completed the other parts of the assignment. You must first import sys in order to make the sys
module available.
Solution:
2. [4 point] Exercise 2: Write the initialization script for the parameters in an autoencoder with a single
hidden layer:
In class we learned that a NN’s parameters are the weights w and the offset parameters b. Write a
script where you initialize them given the size of the hidden layer and the visible layer.
Then reshape them and concatenate them so they are all allocated in a single parameter vector.
Example: For an autoencoder with visible layer size 2 and hidden size 3, we would have 6 weights
(w1) from the visible layer to the hidden layer and 6 more weights (w2) from the hidden layer to the
output layer; there are also one bias term (b1) with 3 parameters, one for each of the hidden nodes,
and another bias term (b2) with 2 parameters to the output layer. This will make a total of 6+6+3+2
= 17 parameters. The output of your script should be a vector of 1×17 elements, with order [w1, w2,
b1, b2].
Tip: use the np.concatenate function to put the vectors together in the desired order, and the
np.reshape function to put the result vector in the shape 1 × size.
Solution:
4. [4 point] Exercise 3: Write the cost function for an autoencoder as well as the gradient for each of the
parameters.
In class we learned that we can use gradient descent to train a NN. In this exercise we will use a more refined version of this called LBFGS (http://en.wikipedia.org/wiki/Limited-memory_BFGS), which
is readily implemented in the optimization library of scipy. For your convenience the implementation
is ready to run.
To use it, you need to define functions for the cost and for the gradient of each parameter. Do this
based on the error functions δ that we defined in the slides in class.
The functions use the data to do a forward pass of the network, calculates the overall error, and then
calculate the gradient per each parameter.
Tip: In the code, there is a flag called debug; if set to True, it will run a debugging code to check if
your gradient is correct.
You might want to load fewer images in this step, so you do not spend too much time waiting for all
the examples.
The gradient has to be a matrix of the same size as the parameter matrix, while the cost has to be the
evaluation of the cost after data has passed through.
Solution:
2
4. [3 point] Exercise 4:
If your gradient is correct, now load the 10,000 images and run the code.
Test the code and change the size of the hidden layer to 50, 100, 150 and 250. By the end, the code
creates an image called weights (which overwrites the one in ex. 1) and it prints the weights obtained
after training.
Report those weights and comment of the difference between using different sizes in the hidden layer.
Solution:
4. [2 point] Exercise 5:
Once everything is running, run the code stacked ae hw.py, which implements the stacked autoencoder concept that we saw in class on Thursday. At the end you should have a report of your results
accuracy. Change the number of training examples to 100, 500, 1000, 10000 and report the results
here.
Tip: You might want to check how much time it takes to run on your computer before-hand and
consider leaving it overnight.
Solution:
3

ISTA 421/521 – Homework 5

Description

Related products

ISTA 421/521 – Homework 2

ISTA 421/521 – Homework 3

ISTA 421/521 – Final Take-home Assignment