CS/ECE/ME532 Assignment 7 Wikipedia Wisconsin PageRank

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment

Description

5/5 - (1 vote)

1. Wikipedia Wisconsin PageRank. A dataset was created by looking at the links
between Wikipedia pages with the word ‘Wisconsin’ in the title. The corresponding
graph contains 5482 nodes (Wikipedia pages) and 41,306 edges (links between the
pages).

Two files contain the data. The edges are store in a file with two columns of integers,
where the first column indicates the from node and the second column indicates the to
node. A second file contains the titles of the Wikipedia pages, and their integer value.

Use the starter script to load the edges file and create an adjacency matrix A, where
Ai,j = 1 if there is a edge from node j to node i, and zero elsewhere.

a) Write code to:
i. ensure there are no traps by adding 0.001 to each entry of A
ii. normalize A, and
iii. use an eigen decomposition to rank the importance of the Wikipedia pages.

b) What is the title of the page ranked 1st (i.e, the most important page)?
c) What is the title of the page ranked 3rd?

2. Gradient Descent and Logistic Regression.
For a binary linear classifier, the predicted class is given as ybi = sign(x
T
i w). Training
in machine learning involves finding a w that does a good job on labeled data, and
often involves solving an optimization of the form:

min
w
X
i
`i(w) + λr(w)
where `i(w) is the loss on the ith training example, and r(w) is a regularizer. In ridge
regression, the loss function is the squared error and the regularizer is the 2-norm of
w, and training amounts to solving the following minimization:

min
w
Xn
i=1
(x
T
i w − yi)
2 + λ||w||2
2
.

For some classification tasks, the squared error can be a poor loss function, since easyto-classify data points can have a large associated loss. This happens when a data
point is correctly classified, but far from the decision boundary (i.e, the absolute value
of x
T
i w is large).

In practice, an often more appropriate loss function is the logistic loss, given by
`i(w) = log 
1 + e
−yix
T
i w


a) For a binary linear classifier, explain (mathematically) why the logistic loss function does not suffer from the same problem as the squared error loss on easy to
classify points.

b) Compute an expression for the gradient (with respect to w) of the `2 regularized
logistic loss:
min
w
Xn
i=1
log 
1 + e
−yix
T
i w

+ λ||w||2
2
c) Use the expression for the gradient that you derived to implement gradient descent
and train a classifier on the provided dataset. For simplicity, you may assume
λ = 1.

d) Plot the data points (indicating their class with different colors) and plot the
decision boundary. What is the error rate of your classifier on the training data?

e) Train a classifier using the squared error loss, and plot the decision boundary.
How does this compare with a decision boundary when trained with logistic loss?

f) Add 1000 easy to classify data points to the training set: more specifically, 1000
points with yi = −1 and x = [10, 0]T
. Re-train your classifiers and comment on
the performance when trained with the logistic loss and the squared error.