Description
1 RNN’s (Recursive Neural Network)
Welcome to SAIL (Stanford Artificial Intelligence Lab): Congrats! You have just been given a
Research Assistantship in SAIL. Your task is to discover the power of Recursive Neural Networks (RNNs).
So you plan an experiment to show their effectiveness on Positive/Negative Sentiment Analysis. In this part,
you will derive the forward and backpropogation equations, implement them, and test the results.
Our RNN has one ReLU layer and one softmax layer, and uses Cross Entropy loss as its cost function.
We follow the parse tree given from the leaf nodes up to the top of the tree and evaluate the cost at each
node. During backprop, we follow the exact opposite path. Figure 1 shows an example of such a RNN
applied to a simple sentence ”I love this assignment”. These equations are sufficient to explain our model:
CE(y, yˆ) = −
X
i
yi
log( ˆyi)
where y is the label represented as a one-hot row vector, and yˆ is a row vector containing the predicted
probabilities for all classes. In our case, y ∈ R
1×5 and ˆy ∈ R
1×5
to represent our 5 sentiment classes: Really
Negative, Negative, Neutral, Positive, and Really Positive. Furthermore,
h
(1) = max(h
h
(1)
Lef t, h(1)
Right i
W(1) + b
(1)
, 0)
yˆ = softmax(h
(1)U + b
(s)
)
where h
(1)
Lef t is the vector representation of the left subtree (possibly be a word vector), the h
(1)
Right of the
right subtree. For clarity, L ∈ R
|V |×d
, W(1) ∈ R
2d×d
, b
(1) ∈ R
1×d
, U ∈ R
d×5
, b
(s) ∈ R
1×5
.
(a) (20 points) Follow the example parse tree in Figure 1 in which we are given a parse tree and truth labels
y for each node. Starting with Node 1, then to Node 2, finishing with Node 3, write the update rules for
W(1)
,b
(1)
, U, b
(s)
, and L after the evaluation of ˆy against our truth, y. This means for at each node, we
evaluate:
1
CS 224d: Assignment #3
Figure 1: RNN (Recursive Neural Network) example
δ3 = ˆy − y
as our first error vector and we backpropogate that error through the network, aggregating gradient at
each node for:
∂J
∂U
∂J
∂b(s)
∂J
∂W(1)
∂J
∂b(1)
∂J
∂Li
Points will be deducted if you do not express the derivative of activation functions (ReLU) in terms of
their function values (as with Assignment 1 and 2) or do not express the gradients by using an “error
vector” (δi) propagated back to each layer. Tip on notation: δbelow and δabove should be used for error
that is being sent down to the next node, or came from an above node. This will help you think about
the problem in the right way. Note you should not be updating gradients for Li
in Node 1. But error
should be leaving Node 1 for sure!
(b) (80 points) Implementation time! We have simplified the problem to reduce training time by binarizing
the sentiment labels. This means that all the sentences are either positive or negative. The internal
nodes however, can be positive negative or neutral. While training, the cost function includes predictions
over all nodes that have a sentiment associated with them and ignores the neutral nodes. While testing,
we are only interested in the performance at the full sentence level. This is all provided for you in the
starter code.
Page 2 of 3
CS 224d: Assignment #3
• (a) Download, unzip, and have a look at the code base.
• (b) From the command line, run ./setup.sh to download the labeled parse tree dataset.
• (c) Begin implementing the model in rnn.py by filling out the outline.
• (d) Run the model and show us the resulting loss plot, final training accuracy, final validation
accuracy. You should be able to achieve an accuracy of 75%.
• (e) (10 points extra credit) Tune the parameters of the model to improve performance and report
your new loss plot, final training accuracy, final validation accuracy. Explain why the changes you
made helped achieve the improvement.
• (f) Ensure that your code is using the best model config and the trained weights are present in the
weights folder. Run ./prepare submission.sh and submit the resulting zip file.
Page 3 of 3