Description
Problem 1: Perceptrons and Logistic Regression
In this problem, we’ll build a logistic regression classier and train it on separable and non-separable
data. Since it will be specialized to binary classication, I’ve named the class logisticClassify2.
We’ll start by building two binary classication problems, one separable and the other not:
iris = np.genfromtxt(“data/iris.txt”,delimiter=None)
X, Y = iris[:,0:2], iris[:,-1] # get first two features & target
X,Y = ml.shuffleData(X,Y) # reorder randomly (important later)
X,_ = rescale(X) # works much better on rescaled data
XA, YA = X[Y<2,:], Y[Y<2] # get class 0 vs 1
XB, YB = X[Y>0,:], Y[Y>0] # get class 1 vs 2
For this problem, we are focused on the learning algorithm, rather than performance so, we will
not bother creating training and validation splits; just use all your data for training.
Note: The code uses numpy’s permute to iterate over data randomly; should avoid issues due
to the default order of the data (by class). Similarly, rescaling and centering the data may help
speed up convergence as well.
(a) Show the two classes in a scatter plot (one for each data set) and verify that one data set is
linearly separable while the other is not.
(b) Write (ll in) the function plotBoundary(…) in logisticClassify2.py to compute the
points on the decision boundary. This will plot the data & boundary quickly, which is useful
for visualizing the model during training. To demo your function plot the decision boundary
corresponding to the classier
sign( .5 + 1×1 − .25×2 )
along with the A data, and again with the B data. (These xed parameters will look like an
OK classier on one data set, but a poor classier on the other.) You can create a blank
learner and set the weights by:
import mltools as ml
from logisiticClassify2 import *
learner = logisticClassify2(); # create “blank” learner
learner.classes = np.unique(YA) # define class labels using YA or YB
1
wts = np.array([theta0,theta1,theta2]); # TODO: fill in values
learner.theta = wts; # set the learner’s parameters
(c) Complete the logisticClassify2.predict function to make predictions for your linear
classier. Note that, in my code, the two classes are stored in the variable self.classes,
with the rst entry being the negative class (or class 0), and the second entry being the
positive class. Again, verify that your function works by computing & reporting the error
rate of the classier in the previous part on both data sets A and B. (The error rate on data
set A should be ≈ 0.0505, and higher on set B.)
(d) Verify that your predict code matches your boundary plot by using plotClassify2D with
your manually constructed learner on the two data sets. This will call “predict” on a dense
grid of points, and you should nd that the resulting decision boundary matches the one you
computed analytically.
(e) In my provided code, I rst transform the classes in the data Y into Y 01, with canonical
labels for the two classes: class 0 (negative) and class 1 (positive). In our notation, let
z = x
(j)
· θ
T
is the linear response of the perceptron, and σ is the standard logistic function
σ(z) =
1 + exp(−z)
−1
.
The logistic negative log likelihood loss for a single data point j is then
Jj (θ) = −y
(j)
log σ(x
(j)
θ
T
) − (1 − y
(j)
) log(1 − σ(x
(j)
θ
T
))
where y
(j)
is either 0 or 1. Derive the gradient of the negative log likelihood Jj for logistic
regression, and give it in your report. (You will need this in your gradient descent code for
the next part.)
(f) Complete your train(…) function to perform stochastic gradient descent on the logistic
loss function. This will require that you ll in:
(1) computing the surrogate loss function at each epoch (J =
1
m
PJj , from the previous
part);
(2) computing the prediction and gradient associated with each data point x
(j)
, y(j)
;
(3) a stopping criterion (usually either stopEpochs epochs or that J has not changed by more
than stopTol since the last epoch (here meaning, pass through all the data).
Note on plotting: The code generates plots as the algorithm runs, so you can see its behavior
over time; this is done with pyplot.draw(). Run your code either interactively or as a script
to see these display over time; unfortunately it does not work easily in Jupyter (you will only
see a plot at the end, which is dicult to use for diagnostics.
(g) Run your logistic regression classier on both data sets (A and B). Describe your parameter
choices (stepsize, etc.) and show a plot showing the convergence of the surrogate loss and
error rate (e.g., the loss values as a function of epoch during gradient descent), and a plot
showing the nal converged classier with the data (using e.g. plotClassify2D). In your
report, please also include a listing of any functions that you wrote (at minimum, train(),
but possibly a few small helper functions as well).
(h) Extra Credit (15pt): Add an L2 regularization term (+α
P
i
θ
2
i
) to your surrogate loss
function, and update the gradient and your code to reect this addition. Try re-running your
learner with some regularization (e.g. α = 2) and see how dierent the resulting parameters
are. Find a value of α that gives noticeably dierent results & explain them.