Description
In this programming assignment you are expected to write a number of variants of a Map-Reduce program to produce the following outputs. You will have available to you a collection of textbooks on the departmental server on which you will run your programs and generate outputs.
The tasks are:
- (30) Design and execute a MapReduce program to produce the frequencies of all the words in the book collection, retaining only those words whose frequencies are greater than 5. Submit the following:
- Your commented code for the Mapper and Reducer
- First 50 words and their frequencies from your program’s output file.
- Last 50 words and their frequencies from your program’s output file.
- (30) Design and execute a MapReduce program to produce frequencies of all 2-gram word-pairs in the book collection, retaining only those 2-grams whose frequencies are greater than 5. Submit the following:
- Your commented MapReduce program to produce the frequencies of all the 2-grams in all the books.
- First 50 2-grams and their frequencies from your program’s output file.
- Last 50 2-grams and their frequencies from your program’s output file.
- (30) Design and execute a MapReduce program to produce the top 100 most frequent words in the book collection. You may need two rounds of Map and Reduce processors. Submit the following:
- Your commented code for the Mappers and Reducers.
- The list of 100 most frequent words and their frequencies in the book collection.
- (10) For this submission all parts of your work must be clearly marked and documented. The grader must be able to:
- Clearly see your code segments with proper labels
- Clearly see a labeled document listing the sequence of commands you used to execute the programs
- Clearly see the labeled outputs from the programs.
- All codes/documents/outputs must be combined to form a single pdf file. The grader will not be able to unzip the multiple independent files.