codingprolab@gmail.com

- Home
- Uncategorized
- CSE 417T: Homework 3

$30.00

Category: Uncategorized

Description

5/5 - (4 votes)

Problems:

1. (50 points) Check out the files logistic reg.m and find test error.m from the SVN

repository set up for this assignment. The files are just function headers that need to be filled

in. find test error should encode a function that, given as inputs a weight vector w, a

data matrix X and a vector of true labels y (in the formats defined in the header), returns

the classification error of w on the data (assuming that the classifier applies a threshold at

0 to the dot product of w and a feature vector x (augmented with a 1 in the first position in

the vector to allow for a constant or bias term). logistic reg should encode a gradient

descent algorithm for learning a logistic regression model. It should return a weight vector w and the training set error Ein (not the classification error, the negative log likelihood

function) as defined in class. Use a learning rate η = 10−5 and automatically terminate the

algorithm if the magnitude of each term in the gradient is below 10−3 at any step.

• Implement the functions in the two files. Remember to check in the final version of

your code for these two files.

• Read more about the “Cleveland” dataset we’ll be using here: https://archive.

ics.uci.edu/ml/datasets/Heart+Disease

1

• Learn a logistic regression model on the data in cleveland.train (be careful about

the fact that the classes are 0/1 – you should convert them to −1/+ 1 so that everything

we’ve done in class is still valid). Apply the model to classify the data (using a probability of 0.5 as the threshold) in cleveland.test. In your writeup, report Ein as well

as the classification error on both the training and test data when using three different

bounds on the maximum number of iterations: ten thousand, one hundred thousand,

and one million. What can you say about the generalization properties of the model?

• Now train and test a logistic regression model using the inbuilt matlab function glmfit

(learn about and use the “binomial” option, and check the label format). Compare the

results with the best ones you achieved and also compare the time taken to achieve the

results.

• Now scale the features by subtracting the mean and dividing by the standard deviation

for each of the features in advance of calling the learning algorithm (you may find the

matlab function zscore useful). Experiment with the learning rate η (you may want to

start by trying different orders of magnitude), this time using a tolerance (how close to

zero you need each element of the gradient to be in order to terminate) of 10−6

. Report

the results in terms of number of iterations until the algorithm terminates, and also the

final Ein.

2. (15 points) LFD Problem 3.4

3. (10 points) LFD Problem 3.19

4. (10 points) LFD Problem 4.8

5. (15 points) LFD Problem 4.25, parts (a) through (c) only

2

WhatsApp us