Deep Learning in Hardware Homework 1 ECE 498/598

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (6 votes)

Problem 1: Learning a Boolean Function using ADALINE and MADALINE
In this problem we will test out the ADALINE and MADALINE algorithms in the context of
implementing Boolean functions. Consider the following function:
f(v, w, x, y, z) = (((v ⊕ w) + x) ⊕ y) + (y ⊕ z)
where v, w, x, y, z are binary variables in {−1, +1}, ⊕ is the ’xor’ operation, i.e., a ⊕ b = 1{a==b} −
1{a==−b} and + denotes the ’or’ operation.
1. Draw the truth table of the function f.
2. Code the ADALINE algorithm to learn this Boolean function. Plot the evolution of the five
weights and the bias. Upon convergence, how many of the 32 input combinations result in
the correct output?
3. * Code the MADALINE algorithm to learn this Boolean function. Use two layers and five
units in the intermediate layer. Report the converged values of all weights and biases. How
many of the 32 input combinations result in the correct output?
Figure 1: The considered triangular distribution in Problem 2.
Problem 2: Optimal quantization of a triangular distribution
1
Deep Learning in Hardware
Prof. Naresh Shanbhag Homework 1
ECE 498/598 (Fall 2020)
Assigned: 09/03 – Due: 09/18
In this problem, we will test out the Lloyd-Max (LM) algorithm on the triangular distribution
depicted in Figure 1. Note: the figure is NOT drawn to scale.
1. Recall the quantization level update equation in the Lloyd Max iteration:
rq =
R tq+1
tq
xfX(x)dx
R tq+1
tq
fX(x)dx
Derive the closed form expression of this update (i.e., evaluate the above expression).
2. Code the corresponding LM algorithm to determine the optimal quantization levels for this
distribution. Plot the quantization levels and thresholds and compare with those of a uniform
quantizer. Do this for quantizers with 4 and 16 levels.
3. Empirically evaluate and report the SQNR of the LM and uniform quantizers.
4. * Derive an expression for the SQNR of the LM and uniform quantizers, i.e., compute the
ratio of the data variance E

X2

and the MSE of the quantizer E

X − Xˆ
q
2

. Compare
your answer to the empirical SQNR obtained in the previous question.
5. The quantization we have been discussing so far does not assume clipping. In this problem,
we are assuming a signed clipping scheme where xq = c if x ≥ c and xq = −c if x ≤ −c where
c is some positive number. Derive an expression for the SQNR as a function of Bx and c.
Validate your SQNR expression by plotting the empirical SQNR vs. c for the 4-level and
16-level LM quantizers and c ranges from 0.5 to 1.9. Repeat this operation for the 4-level
and 16-level uniform quantizers as well. Use 1000 samples for each empirical SQNR value.
Problem 3: Profiling Deep Net Complexity
We have seen in class that an important first step in hardware implementation of DNNs is an
estimation of the hardware complexity. For instance, it is very useful to understand storage
and computational costs associated with every DNNs. In this problem, we start by familiarizing ourselves with DNN topologies and how to extract complexity measures from them. You are
asked to consider the following two networks: ResNet-18 and VGG-11. Please refer to https:
//github.com/pytorch/vision/tree/master/torchvision/models for the exact topological description of each of these networks, and answer the following questions for both networks. You are
encouraged to write python scripts using the PyTorch package to solve these problems.
1. Plot the total number of activations and the data reuse factor per layer as a function of layer
index. What is the total number of activations in each network?
2. Plot the total number of weights and the weight reuse factor per layer as a function of layer
index. What is the total number of weights in each network?
2
Deep Learning in Hardware
Prof. Naresh Shanbhag Homework 1
ECE 498/598 (Fall 2020)
Assigned: 09/03 – Due: 09/18
3. Plot the total number of dot products per layer as a function of layer index. How many dot
products are needed per inference for each network?
4. Plot the total number of multiply-accumulates (MACs) per layer as a function of layer index.
How many MACs are needed per inference for each network?
5. Assuming 8-bit fixed-point format, determine the total storage requirements in MBs for all
activations and weights.
6. If each 8-bit MAC consumes 0.5 pJ of energy, how much energy is consumed by the network
to generate one decision? i.e., what is the energy/decision Edec.
7. If each 8-bit MAC takes 1 ns to compute, and there are N such MACs operating in parallel
with 100% utilization, what is the minimum value of N required to support a frame-rate of
30 frames/s.
Problem 4: Getting Started with Deep Learning in Python
In this problem, we will get started with Deep Learning in Python using the PyTorch framework. We
will use the CIFAR-10 dataset which you can download from this link https://www.cs.toronto.
edu/~kriz/cifar.html. On the course website, you will find a link to downaload all necessary files
including a pre-trained model and a script named ”inference.py” which you can use to evaluate the
model.
1. From the provided code and data, please briefly describe the dataset and the network provided.
2. Run the provided code and determine the accuracy of this model on the CIFAR-10 task.
Please report this number.
3. One method to reduce the complexity of a network is pruning which means to zero out some
of the weights in the network. There are various strategies to prune a network. In this
problem, plot the network’s classification accuracy vs. the fraction of pruned weights where
pruning is done by removing (zeroing out) weights whose magnitude is less than a specified
threshold.
Problem 5: Theoretically Optimal Linear Predictor for Time Series
The data generation model is given by:
xn = 0.1gn + 0.5gn−1 − 0.5gn−2 + 0.1gn−3 (1)
3
Deep Learning in Hardware
Prof. Naresh Shanbhag Homework 1
ECE 498/598 (Fall 2020)
Assigned: 09/03 – Due: 09/18
where gn are i.i.d. N (0, 1) random variables. We wish to predict xn from its previous samples
xn−1, xn−2, and xn−3, i.e., we wish to find a function ˆxn = f(xn−1, xn−2, xn−3) such that the cost
function E

e
2
n

= E
h
(ˆxn − xn)
2
i
is minimized. This cost function is called the mean squared error
(MSE). Furthermore, for simplicity, we wish to restrict the predictor f to be a linear function of its
arguments as shown below:
xˆn = w1xn−1 + w2xn−2 + w3xn−3 (2)
1. Simulate the data generation model (1) and the inference model (2) with w1 = 0.4, w2 = −0.1,
and w3 = 0.02 and empirically estimate the MSE using 1000 samples.
2. Obtain the optimal predictor coefficients.
3. Find the minimum MSE of your optimal predictor.
Problem 6: Computationally Efficient Predictor *
For this problem, please refer to the previous problem and answer the following questions:
1. What makes this solution hard to compute (hint: consider an N-tap predictor where N is
large)?
2. Can you suggest a computationally friendly algorithm to obtain your solution?
4