Description

5/5 - (4 votes)

In this assignment, you will practice sentiment analysis with textual data.

You are provided with a dataset “MovieReview-Sample.csv” which contains 2,000 movie review text, and a labeled sentiment. Label “0” is Negative and label “1” is Positive.

Question 1: Performance Comparisons

You are asked to use four approaches taught in Lab 2 to perform sentiment analysis on the dataset: 1) Using Bing Liu’s Lexicon; 2) Using LM dictionary; 3) Using TextBlob; and 4) Using Vader (either from NLTK or from Vader directly).

Please report the following:

Report Precision, Recall and F measure achieved by each tool. Notice that you will calculate them by comparing your prediction and the gold standard (label 0 and 1). Please present the result in a comparison table and highlight the highest performance.

(Hint, you should report precision not accuracy. This means you need to calculate positive precision, negative precision and then average precision)

Provide your analysis of the performances. If you are in charge of identifying the appropriate software to perform sentiment analysis for movie reviews, which one will you choose? Give 1-2 reasons.

Question 2: Ensemble

You are going to using ensemble method to improve the performance of individual tool. Can you think of a way to ensemble the three methods/tools to improve the performance?

(Hint 1: you may choose the 3 best performing algorithms to ensemble. There is no need to include inferior algorithms from the previous step. Hint 2: the simplest form of ensemble is a majority vote, or a weighted majority vote based on the algorithm performances). Report your performance improvement (in percentage) over any single models.

Bonus: I also provide the original full dataset “Movie_review_Polarity_CSV.zip”. Please notice that this file contains pos.csv and neg.csv. You may run your algorithm on the full dataset and see if the performance hold from the sample dataset.

Submission:

Word Report
Python program. Please make sure your python program can run successfully.

Other instructions:

DO NOT submit your dataset. Only submit Word and python program.
Do not use absolute path to read your input data (it won’t run on your TA’s computer)
Name all your files FirstName_LastName.xxx. This will make our grading easier.
Do not zip your file. Submit two files directly.

Thank you!

Assignment 2: Sentiment Analysis

Description

Related products

CSCI 480 Assignment 2 — Process Communication

CSCI 4611 Assignment 2: Car Soccer

PROG 1347 Assignment #2 (Extend-O-Credit eligible)