Description
The required task is to write a map-reduce program that will perform equijoin.
The code should be in Java (use Java 1.7.x) using Hadoop Framework (use Hadoop 2.6.x
or higher).
The code would take two inputs, one would be the HDFS location of the file on which
the equijoin should be performed and other would be the HDFS location of the file,
where the output should be stored.
Format of the Input File: –
Table1Name, Table1JoinColumn, Table1Other Column1, Table1OtherColumn2, ……..
Table2Name, Table2JoinColumn, Table2Other Column1, Table2OtherColumn2, ………
Format of the Output File: –
If Table1JoinColumn value is equal to Table2JoinColumn value, simply append both line side by
side for Output. If Table1JoinColumn value does not match any value of Table2JoinColumn,
simply remove them for the output file. You should not include two joins contains same row (No
duplicate joins in output file).
Note: –
Table1JoinColumn and Table2JoinColumn would both be Integer or Real or Double or Float,
basically Numeric.
Example Input : –
R, 2, Don, Larson, Newark, 555-3221
S, 1, 33000, 10000, part1
S, 2, 18000, 2000, part1
R, 3, Sal, Maglite, Nutley, 555-6905
S, 3, 24000, 5000, part1
S, 4, 22000, 7000, part1
R, 4, Bob, Turley, Passaic, 555-8908
Example Output: –
R, 2, Don, Larson, Newark, 555-3221, S, 2, 18000, 2000, part1
S, 1, 33000, 10000, part1
S, 2, 18000, 2000, part1, R, 2, Don, Larson, Newark, 555-3221
R, 3, Sal, Maglite, Nutley, 555-6905, S, 3, 24000, 5000, part1
S, 3, 24000, 5000, part1, R, 3, Sal, Maglite, Nutley, 555-6905
S, 4, 22000, 7000, part1, R, 4, Bob, Turley, Passaic, 555-8908
R, 4, Bob, Turley, Passaic, 555-8908, S, 4, 22000, 7000, part1
Lines marked, as Red should not be present in the Output file, it is just for your clear
understanding.
Submission Instructions: –
Upload a zip file named Assignment4.zip, which will contain three files, equijoin.java,
equijoin.jar and a ReadMe.txt. I will use equijoin.jar to run the assignment. ReadMe.txt should
contain the approach you used for doing this work, basically how your mapper, reducer and
driver is working in short.
This is how I am going to run your submission: –
sudo -u
2
Example: –
sudo -u hduser /usr/local/hadoop/bin/hadoop jar equijoin.jar equijoin
hdfs://localhost:54310/input/sample.txt hdfs://localhost:54310/output
Instructions for Assignment: –
Please follow these instructions closely else Marks will be deducted.
1. Please make sure you follow the submission instructions carefully and do not miss any
files.
2. Please make sure to run the jar before submitting and make sure there is no compilation
or runtime error.
3. Make sure your jar can be run from arbitrary location.
4. For any case of doubt in the assignment, PLEASE USE Discussion Boards, Individual
mails would not be entertained.
5. Also, it is an individual’s responsibilities to clarify his/her doubts, so read and use
Discussion Board extensively.