Description
The goal of this assignment is to explore recurrent neural nets (RNNs) and to understand their limitations..
The Task
I would like you to implement a recurrent neural net to learn the parity operator. The net will have a single input unit and a single output unit, and a fully-connected layer of H hidden units. The inputs and target outputs are binary. When an input sequence is presented, the output state at the end of the sequence should be a parity bit: output should be 1 if the input has an odd number of ‘1’ values. For example, the sequence 1-0-0-1-0-1 should yield output 1 and the sequence 0-0-0-0-1-1 should yield output 0. Note that a target is given only at the end of each sequence. (Parity is easy to learn if there is a target at each step that indicates parity given the sequence so far.)
Parity is a hard problem for neural nets to learn because very similar inputs produce different outputs, and very dissimilar inputs can produce the same output.
The aspects of the task we will manipulate are: H, the number of hidden units, N, the length of the input strings, and the activation function for the hidden units, either tanh or LSTM-style neurons. The output neuron should have a logistic activation function.
Some Help
Tensorflow has built in recurrent net functionality via tf.contrib.rnn.BasicRNNCell and tf.contrib.rnn.BasicLSTMCell. More help may be on its way. Denis and I are thinking of providing you with a shell of the code.
Denis wrote some code to generate random data strings for training:
def generate_parity_sequences(N, count):
“””
Generate :count: sequences of length :N:.
If odd # of 1’s -> output 1
else -> output 0
“””
xor = lambda x: 1 if (x % 2 == 1) else 0
sequences = np.random.choice([0, 1], size=[count, N], replace=True)
counts = np.count_nonzero(sequences == 1, axis=1)
# xor each sequence, expand dimensions by 1 to match sequences shape
y = np.expand_dims(np.array([xor(x) for x in counts]), axis=1)
# In case if you wanted to have the answer just appended at the end of the sequence:
# # append the answer at the end of each sequence
# seq_plus_y = np.concatenate([sequences, y], axis=1)
# print(sequences.shape, y.shape, seq_plus_y.shape)
# return seq_plus_y
return np.expand_dims(sequences, axis=2), y
Part 1
Set your code up to train a net given H and N. Each time you run the code, it should randomize the initial weights and generate a random training set of 10000 examples of length N. Also generate a random test set of 10000 examples of length N.
Train your net for H ∈ {5, 25} and for N ∈ {2, 10, 25, 50}. Use an RNN with tanh activation functions. For each combination of H and N, run 10 replications of your simulation.
Make a graph of mean % correct on the test set for the different values of H and N. I’ll be more impressed if you plot not only the mean but also the standard error of the mean (= standard deviation of the 10 replications divided by sqrt(10) ).
Part 2
Repeat the experiment of Part 1, but use LSTM neurons instead of standard tanh neurons in the recurrent layer.