CS 498: Computational Advertising Homework 3

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (4 votes)

2 Question 1 (6 points)
Scroll to Section 9.8, Exercises in the reference chapter linked on the first page. In this
question, you are required to solve Problem (3). In parts (a) and (b), you must show the
precise steps to arrive at your answers. They carry 2 points each. Part (c) is also worth 2
points.
1https://www.cs.cornell.edu/home/kleinber/networks-book/networks-book-ch09.pdf
1
3 Question 2 (4 points)
Scroll to Section 9.8, Exercises in the reference chapter linked on the first page. In this
question, you are required to solve Problem (4). In parts (b), you must show the precise
steps to arrive at your answers. Both parts carry 2 points each.
4 Programming Question (10 points)
The objective of this programming assignment is to design a movie recommender system.
The main goal of such a system is to recommend relevant movies to an user based on
available data. Data includes information about the movies and ratings provided by a user
to a subset of movies. We will have some metadata information about each movie like title,
a brief overview , tagline of the movie etc. We also have the ratings that a user has provided
to some of the movies. Now based on movie metadata and ratings information, we need to
recommend new movies to an user.
To recommend new movies to an user, we need to determine how relevant a movie is
to an user. Relevance between a movie and an user is captured by the rating that a user
provides to a particular movie. To recommend movies to an user, we need to infer/estimate
the rating that a user would have provided to a movie if she had already seen this movie. So
our recommendation problem becomes an unknown user-movie rating estimation problem.
Let the set of users be represented by U(u ∈ U) and set of movies by represented by
M(m ∈ M). We want to estimate ˆrum i.e. rating provided by u to m. Note that for the
(u, m) pairs already present in the data ˆrum = rum
For the (u, m) pairs not present in the data, we will use the formula below
rˆum = bum +
P
j∈R(u)
smj (ruj − buj )
P
j∈R(u)
smj
(1)
bum = µ + bu + bm (2)
bm =
P
u∈R(m)
(rum − µ)
|R(m)|
(3)
bu =
P
m∈R(u)
(rum − µ − bm)
|R(u)|
(4)
bum is the baseline rating predictor. µ is the global mean calculated across all the ratings
available in the dataset. R(m) is the set of users that have rated the movie m in the
available dataset. R(u) are the set of the movies that have been rated by the user u. smj is
the similarity value between 2 movies m and j.
2
We need the similarity value smj in Equation (1). In this assignment, we will use movie
metadata content to calculate the value smj We will provide the movie metadata information
as a document or collection of words. sm,j is the cosine similarity between the metadata
document of the 2 movies. In order to calculate cosine similarity, first we need to convert
the metadata documents to a vector form. The common way of doing this is to transform
documents into tf-idf vector.
Term Frequency also known as tf measures the number of times a term (word) occurs
in a document. Since every document is different in length, it is possible that a term would
appear much more times in long documents than shorter ones. Thus, the term frequency is
normalized by the document length.
tf(t, m) = Number of times term t appears in metadata of a movie m
Total number of terms in the metadata about m
Inverse Document Frequency also known as idf measures how important a term is to a
particular movie.
idf(t) = loge
(
Total number of movies in dataset
Number of movies with term t in it)
Now let the total number of unique words across all movie metadata documents be
V = (V1, V2, …., V|V |). Now we will convert metadata document for movie m into a |V |
dimensional vector d
m.
d
m = (d
m
1
, dm
2
, …., dm
|V |
)
d
m
i = tf(Vi
, m) ∗ idf(Vi)
Now we will calculate the similarity as
smj = cosine(d
m, dj
)
In this MP given an input of a (u, m) , you need to output the estimated rating value
rˆum. Round ˆrum to 1 decimal point.
3