Description
Problem 1 The purpose of this problem is to illustrate how to train recurrent neural networks
(RNNs). We will illustrate concepts from the class, and learn a RNN that models an OrnsteinUhlenbeck process. You do not need to know anything about Ornstein-Uhlenbeck processes for the
purpose of this assignment, but if you are curious, you can learn more about them online (e.g.,
https://en.wikipedia.org/wiki/Ornstein-Uhlenbeck process).
Code in the starter notebook includes a function generate ou process to sequentially generate Ornstein-Uhlenbeck time series, to which Gaussian noise is added. The first graph in the
notebook includes an example with the ground truth Ornstein-Uhlenbeck time series and its noisy
counterpart. Our goal is to train a RNN to denoise the time series.
The RNN we use in this assignment uses Gated Recurrent Units (GRUs) rather than LSTMs
we studied in class. GRUs are similar to LSTMs but do not include an output gate. Here are the
3 gates implemented by the GRU:
• Update gate: zt ← σ(Wz · xt + Uz · ht−1 + bz)
• Reset gate: rt ← σ(Wr · xt + Ur · ht−1 + br)
• Output gate: ot ← zt
· ht−1 + (1 − zt) · tanh(Wo · xt + Uo · (rt ◦ ht−1) + bo)
where σ is the sigmoid function, xt the input at time step t, ht the hidden state at time step t, and
all other parameters either weights or biases. For instance, Wz, Uz, and bz are the parameters for
the update gate. They respectively represent the weight of the connection to the input, the weight
of the connection to the previous hidden state, and the bias.
1. (1 point) Fill the line implementing the forward pass for the update gate in apply fun scan.
2. (1 point) Fill the line implementing the forward pass for the reset gate in apply fun scan.
3. (1 point) Fill the line implementing the forward pass for the output gate in apply fun scan.
4. (1 point) Fill the missing line in the function mse loss. The function returns the mean squared
error loss between the model’s predictions (i.e., preds) and the target sequence (i.e., targets).
ECE1513H – Winter 2020 Assignment 6 – Page 2 of 2 Due April 6
5. (2 points) Fill the missing lines at the top of the cell titled “Training the RNN”. These lines
should use the optimizers pre-built into JAX to instantiate an Adam optimizer. As seen in
previous homeworks, you should obtain three things from the pre-built JAX optimizer: a
method opt init that takes in a set of initial parameter values returned by init fun and
returns the initial optimizer state opt state, a method opt update which takes in gradients
and parameters and updates the optimizer states by applying one step of optimization, and a
method get params which takes in an optimizer state and returns current parameter values.
6. (1 point) Fill the lines that define x in (the input) and y (the output) of our recurrent neural
network. The RNN should take in all but the last step of the noisy time series, and predict all
but the first step of the ground truth time series (i.e., the time series before it was noised).
7. (1 point) Fill the line that calls update to take one step of gradient descent on the batch of
training data sampled.
8. (2 points) As done in prior assignments, perform a hyperparameter search to find a good value
for your learning rate. Describe briefly how you conducted the search, the value you chose, and
why you chose that value. You may find it useful to call plot ou loss(train loss log)
9. (1 point) Using the last cell of the notebook, comment qualitatively on the difference between
the predicted time series, the ground truth, and the noisy time series. You will have to reuse
the definition of x in (the input) and y from the question above.
∗
∗ ∗