Description
1 Two-layers neural networks
Ex 1.
Suppose x ∈ R
2
. We consider two-layers neural networks (n.n.) of the form (see fig.
1):
f(x) = b2 + W2
σ(b1 + W1 · x)
, (1)
where b1, b2 ∈ R
2 are ’bias’ vectors, W1, W2 ∈ M2×2(R) are matrices (2 × 2) and the
activation function σ is the ReLu function (i.e. σ(x) = max(x, 0)). We denote by s = f(x)
the score predicted by the model with s = (s1, s2) where s1 is the score for class 1 and s2
the score for class 2.
ReLu
Figure 1: Illustration of a two-layer neural network using ReLu activation function.
a) Consider the points given in figure 2-left where each color correspond to a different
class:
class 1: x1 = (1, 0) and x2 = (−1, 0),
class 2: x3 = (0, 1) and x4 = (0, −1).
Find some parameters b1, b2, W1 and W2 such that the scores s satisfy:
s1 > s2 for x1 and x2 , s1 < s2 for x3 and x4.
1
−1.0 −0.5 0.0 0.5 1.0
−1.0
−0.5
0.0
0.5
1.0
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
−2.0
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
Figure 2: Data points to classify.
b) Consider now the data-set given in figure 2-right (see code below to load the data).
Train a two-layer neural network of the form (1) to classify the points. Provide the
accuracy of the model (percentage of correctly predicted labels).
##################################
## Exercise 1 ##
##################################
import numpy as np
import pandas as pd
df = pd.read_csv(‘data_HW2_ex1.csv’)
X = np.column_stack((df[‘x1’].values,df[‘x2’].values))
y = df[‘class’].values
Ex 2.
The goal of this exercise is to show that two-layers neural networks with ReLu activation can approximate any continuous functions. To simplify, we restrict our attention
to the one-dimensional case:
g : [0, 1] −→ R (continuous).
We claim that for any ε > 0, there exists f two-layers n.n. such that:
max
x∈[0,1]
|g(x) − f(x)| < ε. (2)
In contrast to Ex 1, f will be taken with a large hidden layers, i.e. z ∈ R
m with m 1
(see figure 3-left). To prove this result, we are going to show that f can be used to
perform piece-wise linear interpolation (see figure 3-right).
a) Denote y0 = g(0) and y1 = g(1). Find a two layers n.n. such that f(0) = y0 and
f(1) = y1.
2
b) Consider now three points: y0 = g(0), y1 = g(1/2), y2 = g(1). Find f such that:
f(0) = y0, f(1/2) = y1 and f(1) = y2.
c) Generalize: write a program that take as inputs {(xi
, yi)}0≤i≤N with xi < xi+1 and
return a two layers n.n. such that f(xi) = yi
for all i = 0 . . . N.
Extra) Prove (2).
Hint: use that g is uniformly continuous on [0, 1].
ReLu
1
Figure 3: Left: two layers neural network used to approximate continuous function. The
hidden layer (i.e. z = (z1, . . . , zm)) is in general quite large. Right: to approximate
the continuous function g, we interpolate some of its values (xi
, yi) by a piece-wise linear
function.
2 Convolution
Ex 3.
Using convolutional layers, max pooling and ReLu activation functions, build a classifier for the Fashion-MNIST database (see a sketch example in figure 4). The accuracy of
your network on the test set will be your score on this exercise (+5 points for the group
with the highest accuracy).
input (image) output (score)
pooling flatten
+
fully
connected
conv
ReLu
+
conv
ReLu
+
channels
Figure 4: Schematic representation of neural network for image classification.
3