## Description

1. Binary linear classifiers. Assume there are two possible labels, y = 1 or y = −1

associated with two features x1 and x2. We consider several different linear classifiers

yˆ = sign{x

T w} where x is derived from x1 and x2 and w are the classifier weights.

Define the decision boundary of the classifier as the set {x1, x2} for which x

T w = 0.

Let x2 be the vertical axis in your sketches and depict the interval 0 ≤ x1, x2 ≤ 1.

a) Classifier 1. Let x

T =

x1 x2

and assume w =

”

5

−2

#

.

i. Sketch the decision boundary in the x1-x2 plane.

ii. Does the decision boundary represent a subspace in R

2

? Why or why not? If

it represents a subspace, then find an orthonormal basis for the subspace.

b) Classifier 2. Let x

T =

x1 x2 1

and assume w =

5

−2

1

.

i. Sketch the decision boundary in the x1-x2 plane.

ii. Does the decision boundary represent a subspace in R

2

? Why or why not? If

it represents a subspace, then find an orthonormal basis for the subspace.

c) Classifier 3. Let x

T =

x

2

1 x2 1

and assume w =

1

−2

1

.

i. Sketch the decision boundary in the x1-x2 plane.

ii. Does the decision boundary represent a subspace in R

2

? Why or why not? If

it represents a subspace, then find an orthonormal basis for the subspace.

2. Linear Classifier. Download the script and the data file classifier data.mat. This

code trains linear classifiers using least squares. The scripts provided steps through

the problems below.

Make sure to ‘publish’ your results and store them as a PDF file for submission.

a) Classifier 1. Let x

T =

x1 x2

. Briefly comment on the fit of the classifier to

the decision boundary apparent in the evaluation data. Also identify the percent

error based on the ratio of misclassified evaluation data points to the total number

of evaluation data points.

b) Classifier 2. Consider squaring the original features, and also using them for

classification, so that: x

T =

x

2

1 x

2

2 x1 x2 1

. This will allow for a curved

decision boundary. Briefly comment on the fit of the classifier to the decision

boundary apparent in the evaluation data. Also identify the percent error based on

the ratio of misclassified evaluation data points to the total number of evaluation

data points.

c) Shortcoming of training using least squares as a loss function. Training a classifier

using the squared error as a loss function can fail when correctly labeled data

points lie far from the decison boundary. A new dataset consisting of the first

dataset, plus 1000 (correctly labeled) datapoints at x1 = 0, x2 = 3 is created.

What happens to the decision boundary when these new data points are included

in training? What happens to the error rate if you move the 1000 data points to

x1 = 0, x2 = 10? Why does this happen?

3. Overfitting. Download the dataset overfitting data.mat. You may find it helpful

to adapt the code from the previous problem. The dataset has 50 data points for

training, and 10,000 data points to be used for evaluation of the classifier. Each data

point consists a two-dimensional feature vector x and a label y ∈ {−1, 1}. The feature

vector is a “noisy” version of the true underlying feature, which blurs the boundary

between classes.

a) Plot the training data using a scatter plot. Indicate the points y = −1 using one

color, and the points with y = 1 with another.

b) Plot the evaluation data using a scatter plot. Indicate the points labeled −1 using

one color, and the points labeled +1 with another.

c) Classifier 1. As before, x =

x1 x2

T

, and y = sign(x

T w).

i. Train the classifier using least squares to find the classifier weights w. Apply

the classifier to the evaluation data, and plot the data points using a scatter

plot with different colors for different predicted labels.

ii. Plot the correctly predicted evaluation data points using one color, and incorrectly predicted points using a second color. How many errors are there?

d) Classifier 2. Let x =

x

2

1 x

2

2 x1 x2 1

T

, and y = sign(x

T w).

i. Train the classifier using least squares to find the classifier weights w. Apply

the classifier to the evaluation data, and plot the data points using a scatter

plot with different colors for different predicted labels.

ii. Plot the correctly predicted evaluation data points using one color, and incorrectly predicted points using a second color. How many errors are there?

e) Classifier 3. Let x =

x

6

1 x

6

2 x

5

1 x

5

2

. . . x1 x2 1

T

, and y = sign(x

T w).

i. Train the classifier using least squares to find the classifier weights w. Apply

the classifier to the evaluation data, and plot the data points using a scatter

plot with different colors for different predicted labels.

ii. Plot the correctly predicted evaluation data points using one color, and incorrectly predicted points using a second color. How many errors are there?

f) Of the three classifiers, which one performs worse, and why?

4. A binary linear classifier based on three features x1, x2, and x3 is ˆy = sign{x

T w}

where x

T =

x1 x2 x3

. Hence the decision boundary is the set {x1, x2, x3} for

which x

T w = 0.

The decision boundary for a two-dimensional classifier is a line. What type of geometric

object is the decision boundary in three dimensions?

5. A decision boundary for a classification problem involving features x1, x2, and x3 is

defined as x

T w = 0 where x

T =

x1 x2 x3 1

. Find w so that the decision

boundary is parallel to the x1-x2 plane and includes the point (x1, x2, x3) = (0, 0, 1).