Description
Introduction: Recommendation systems and “suggestions” have become an essential part of a lot
of web applications. Examples of these applications include recommended items in online stores, pages
or people in social networking sites, movies or songs in streaming websites, etc.
What has enabled these
systems to expand widely to so many different applications is the rapidly growing collective user data
and the use of Collaborative Filtering. “Collaborative Filtering” refers to methods of predicting a user’s
opinion on an entity using other users’ opinion. These methods are based on the notion that there exist
other users with similar behavior as the target user and finding them and observing their actions will
reveal information that we could use to predict the target user’s behavior.
We can view this as a matrix problem by putting all users on rows and all items on columns of the
matrix. Then the entry (π,π) of the matrix will be the rating user π has given to item π. The problem will
now become estimating the empty entries of the matrix to predict what items a user will most probably
like other than the ones they have rated.
In this project we are going to use a popular method for collaborative filtering called “Alternating Least
Squares” to make a recommendation system on the MovieLens1 dataset that contains movies and user
ratings.
1) Download the dataset with 100k ratings and create a matrix named R containing user ratings with
users on rows and movies on columns. Note that a lot of the values in the matrix will be missing and
only a small fraction of the matrix entries will contain known data points. We want to run a matrix
factorization job to get matrices U and V such that π
πΓπ β ππΓπππΓπ in known data points. For this
purpose, we calculate matrices π and π such that the squared error is minimized:
min β (πππ β (ππ)ππ)
2
ππππ€π π,π
This can be implemented by putting 0 s where the data is missing and creating a weight matrix to
calculate the squared error. Assume that the weight matrix ππΓπ contains 1 in entries where we have
known data points and 0 in entries where the data is missing. Then if R contains 0 in missing entries, we
can formulate the above problem as:
minββπ€ππ(πππ β (ππ)ππ)
2
π
π=1
π
π=1
1 http://grouplens.org/datasets/movielens/
You are free to choose your programming language; and there are implementations of the least square
factorization in related packages for various programming languages; e.g. wnmfrule function in the
Matrix Factorization Toolbox2
in MATLAB. Choose π equal to 10, 50, and 100. What is the total least
squared error in each case?
2) Now test the recommendation system you designed by a “10-fold Cross-validation”; that is randomly
split the data and take 90% of the known ratings for training and the other 10% unknown for testing.
Note that in our model this can be achieved by putting zero weight on the data we want to make
unknown. You shouldn’t remove rows or columns of the matrix entirely because that would jeopardize
the idea behind this method. One way to avoid this problem is taking random points for testing, so that
the data points we remove are dispersed throughout the matrix. However, we want the 10% testing
data in each test to not overlap with the other 9 tests.
Assume that we assign an index 1 to N to the N known data points. Create a random ordering of the
numbers 1 to N and split this list of numbers into 10 parts. Now each time take one of these 10 parts for
testing and train on the rest.
Train your system as in part 1 and find the average absolute error over testing data (prediction error
|π
πππππππ‘ππ β π
πππ‘π’ππ|). What is the average error? (Average among 10 the tests for each system, and
for each test among all entries)
What are the highest and lowest values of average error among the 10 tests? (Average over the testing
entries in each test and find which of the 10 tests has highest or lowest average error)
3) Now assume that if a user has rated a movie 3 or lower we conclude they didn’t like the movie, and
if a user has rated a movie 4 or higher they have liked it. Based on the estimated values in the R matrix
in parts 1 and 2, we can use a threshold on the estimated entry to predict if the user will like the movie
or not.
In your tests with the method in part 2, find out the number of entries where your system
predicted the user would like the movie. Out of these, for what percentage did the user actually like the
movie? (Precision) Now find the entries in the testing data where the user did actually like the movie.
Out of these entries, for what percentage did your algorithm predict the user would like the movie?
(Recall) Plot the ROC curve (precision over recall) for different values of the threshold for the designed
recommendation system.
4) Try changing the weight matrix and use rating values as weights, instead of 1, for the known data
points. Turn R into a 0-1 matrix where πππ = 1 for known ratings and πππ = 0 for unknown entries. Run
matrix factorization again with the same values of π as in part 1. Is there a change in the total squared
error?
2 https://sites.google.com/site/nmftool/
Now, in order to avoid singular solutions, we modify the cost function to add a regularization term π:
minββπ€ππ(πππ β (ππ)ππ)
2
+ π (ββπ’ππ
2
π
π=1
π
π=1
+ββπ£ππ
2
π
π=1
π
π=1
)
π
π=1
π
π=1
This will make the matrix calculations more stable. You can find out how to implement the regularized
version of ALS in a set of slides found in the reference link3 or in the reference paper [1]. Run your code
for π = 0.01, 0.1, 1 and compare the results with the previous parts. Following the same evaluation
method, plot the ROC curve.
5) Create a recommendation system in the following manner:
Convert R to a 0-1 matrix where the entries are 1 for available data points and 0 for missing data points.
Use ratings as weights and run your algorithm to find the top πΏ movies recommended to each user.
Using the cross-validation method described earlier, find out the average precision of your algorithm for
πΏ = 5. While calculating precision, ignore the picked entries for which you don’t have the actual rating.
Find your algorithmβs hit rate (what fraction of the test movies liked by the users are suggested by your
system) and false-alarm rate (what fraction of the test movies not actually liked by the user, are
suggested by your algorithm). Start from πΏ = 1 and while increasing πΏ measure hit rate and false-alarm
rate for different values of πΏ. Plot these values as points in a two-dimensional space with hit rate on the
π¦ axis and false-alarm rate on the π₯ axis. Note that you do not need to run the factorization every time
you change L, only the number of results you pick for recommendation will be different.
[1] Li, Yifeng, and Alioune Ngom. “The non-negative matrix factorization toolbox for biological data
mining.” Source code for biology and medicine 8.1 (2013): 1.
Submission: This project is due on Friday March 4th at 11:59pm. Late submissions are subject to
10% penalty each day past the deadline. Please submit a zip file containing your report, and your codes
with a readme file on how to run your code to ee239as.winter2016@gmail.com. The zip file should be
named as “Project3_UID1_UID2_…_UIDn.zip” where UIDx are student ID numbers of the team
members. If your zip file is too large to email you can send it to the dropbox account associated with this
email. If you had any questions you can send an email to the same address.
3 http://www.slideshare.net/srowen/big-practical-recommendations-with-alternating-least-squares