Name: CS6350 Big data Management Analytics and Management Homework 1
SKU: 22838
Price: 30.00 USD
Availability: InStock

Description

5/5 - (5 votes)

In this homework, you will be using hadoop/mapreduce to analyze social network data.
Q1: Write a MapReduce program in Hadoop that implements a simple “Mutual/Common
friend list of two friends”. The key idea is that if two people are friend then they have a lot of
mutual/common friends. This program will find the common/mutual friend list for them.
For example:
Alice’s friends are Bob, Sam, Sara, Nancy
Bob’s friends are Alice, Sam, Clara, Nancy
Sara’s friends are Alice, Sam, Clara, Nancy
As Alice and Bob are friend and so, their mutual friend list is [Sam, Nancy]
As Sara and Bob are not friend and so, their mutual friend list is empty. (In this case you may
exclude them from your output).
Input:
1. mutual.txt
The input contains the adjacency list and has multiple lines in the following format:

Hence, each line represents a particular user’s friend list separated by comma.
2. userdata.txt
The userdata.txt contains dummy data which consist of
column1 : userid
column2 : firstname
column3 : lastname
column4 : address
column5: city
column6 :state
column7 : zipcode
column8 :country
column9 :username
column10 : date of birth.
Here, is a unique integer ID corresponding to a unique user and is a commaseparated list of unique IDs corresponding to the friends of the user with the unique ID .
Note that the friendships are mutual (i.e., edges are undirected): if A is friend with B then B is
also friend with A. The data provided is consistent with that rule as there is an explicit entry
for each side of each edge. So, when you make the pair, always consider (A, B) or (B, A) for
user A and B but not both.
Output: The output should contain one line per user in the following format:
,
where & are unique IDs corresponding to a user A and B (A and B are
friend). < Mutual/Common Friend List > is a comma-separated list of unique IDs
corresponding to mutual friend list of User A and B.
Please generate/print the Mutual/Common Friend list for the following users:
(0,1), (20, 28193), (1, 29826), (6222, 19272), (28041, 28056)
Q2.
Please use in-memory join at the Mapper to answer the following question.
Given any two Users (they are friend) as input, output the list of the first names and the number
of unique states their mutual friends stay in. Note that the userdata.txt will be used to get the
extra user information and cached/replicated at each mapper.
Output format:
, < List of mutual friends [name1, name2, … namen], Number of unique states>
Sample output:
1234 4312 [John, Jane, Ted], 2
Q3.
Please use in-memory join at the Reducer to answer the following question.
For each user print User ID and average age of direct friends of this user.
Output format:

Sample output:
1234 60
Q4.
Please use a Combiner to answer the following question.
Find friend pair(s) whose number of common friends is the maximum in all the pairs. Note that
you need to use the same dataset from Q1.
Output Format:
,
Q5.
Write a program that will construct inverted index in the following ways.
The map function parses each line in an input file, userdata.txt, and emits a sequence of pairs. The reduce function accepts all pairs for a given word, sorts the
corresponding line numbers, and emits a pair. The set of all the
output pairs forms a simple inverted index.
What to submit
(i) Submit the source code via the eLearning website. (ii) Submit the output file for each
question.

CS6350 Big data Management Analytics and Management Homework 1

Description

Related products

CS6350 Big data Management Analytics and Management Homework 3

CS6350 Big Data Management Analytics and Management Assignment 4

CS6350 Homework #2