Description
In a society where freedom and democracy reign, people have right to return their ruling class.
Election is a formal way of accepting or rejecting a political proposition by voting. People,
particularly living in third-world countries, worry about their votes and question whether the
election is trustful. However, thanks to the statistical tools we can easily find out any fraud
attempt.
In this assignment, you will implement a Python program that analyzes the results of the USA
presidential election held in 2012 and interprets whether it is fraudulent or not. It was election
with 4 major parties (Democratic, Republican, Libertarian, and Green) and 30 nominees most
of whom were write-in candidates. Democrat nominee Obama B., republican nominee Romney
M., libertarian nominee Jonhson G., and green nominee Stein J. participated in the election
resulting with the victory of the democrats. You are provided with election results in a file
named ElectionUSA2012.csv1. This file records the number of votes state by state. There
are eight di↵erent information in one row: State name, total votes, electoral votes, total vote,
# of votes for Obama, Romney, Johnson, Stein, and others. Each row represents a state in
the USA. To summarize, there are 204 election results (exclude the votes for “others”) you
care about to reveal fraudulence, if any.
You will look closely at the least significant digits of votes (ones and tens place in this
assignment) which are essentially noise and do not a↵ect who wins. The idea is that, in any
real election, we expect the ones and tens to be uniformly distributed; namely 10% of the
digits is 0, 10% of the digits is 1, and so forth. If distribution is not uniform, then it is likely
1obtained from http://www.fec.gov/pubrec/fe2012/federalelections2012.shtml
1
BBM 103: Introduction to Programming Laboratory I
made-up by someone rather than collected from ballot-boxes. To accomplish this assignment,
there are some steps you need to carry out.
Step 1: Read election data
The content of ElectionUSA2012.csv file is explained in previous section. Each information in a row is separated with a delimiter (,). To read the file write a function called retrieveData
that takes two inputs one of which represents the filename and the other is a list consisting
of the nominees’ names. It returns a one-dimensional list that contains the vote counts from
every row in a successive manner. Save the output under a file named retrievedData.txt
>>> retrieveData(“ElectionUSA2012.csv”, [“Obama”,“Romney”,“Johnson”,“Stein”])
[795696,122640,1025232,394409,7854285, ,….,20928,4406,7665,0]
Note 1: The arguments of the function defined (filename and a list of nominees’ names)
should be dynamic and take their values from command-line arguments. To invoke retrieveData
with the parameter values of those illustrated above, it is necessary to call your program as
shown below2:
$ python Assignment4.py ElectionUSA2012.csv Obama,Romney,Johnson,Stein
Note 2: A change in the order of nominees’ names (2nd system argument) should change
the output.
Step 2: Bar plot of vote counts
Once you obtain all vote counts, plot a bar figure in order to visualize the vote distribution of
two nominees who dominated the election (Obama and Romney for 2012 USA election) as a
function of state name. To do that, write a function DispBarPlot that takes no input and returns
none. The figure should look exactly same as given in Fig. 1. In the figure, x-axis represents
the states whereas y-axis represents vote counts each nominee took. Vote counts should be
represented with blue and red bars. Do not forget to create legend box without which nothing
would be interpreted. Save your first plot in a file named ComparativeVotes.pdf.
Note: Your implementation should output same plot every time you run with varying order
in nominees’ names. Likewise, your plot should not be a↵ected from a change in the order of
the header of the file provided.
Step 3: Bar plot for vote comparison
In order to reveal the margins between the votes given to each nominee, you are expected
to visualize the comparative vote percentages of all nominees. Write a function compareVoteonBar
that creates a figure window containing bar plot that should look exactly same as provided
in Fig 2. In this plot, vote percentages should be given as x-labels and nominees’ names
should be provided in a legend box. As all you realize, there is no vote percentage present in
election data file. Therefore, you need to obtain vote percentages first and visualize them in
2check your interpreter version by typing python version and be sure it is 3.x
2
BBM 103: Introduction to Programming Laboratory I
Figure 1: Comparative demonstration with bar plot
a descending order (as in Fig 2). Save your plot in a file named CompVotePercs.pdf. Consider the
note given in the previous step.
Step 4: Obtain histogram
As opposed to the previous step, now you care about ones and tens digits of votes to get the
frequencies of them. To do that, write a function named obtainHistogram that takes a list as input
and produces an output as a list of 10 numbers. Each element of output list represents the
frequency of digit appeared in ones and tens place in input. Note that it is 0 for numbers less
than 9.
>>> obtainHistogram([7, 24, 25, 180, 249, 326, 446, 446, 512, 552, 612, 618, 618, 714, 780, 839, 846, 890, 949, 951])
[0.1, 0.15, 0.15, 0.025, 0.175, 0.075, 0.1, 0.025, 0.1, 0.1]
Step 5: Histogram plot
To complete this step, you need to get frequency list calculated in obtainHistogram. Create a
function named plotHistogram that takes a histogram and plots the frequencies of ones and tens
digits for the 2012 USA election data. Your histogram plot should look exactly same as
provided in Fig. 3. In this figure you see two plot lines with di↵erent colors. The red straight
line is frequency distribution of the numbers ranging from 0 to 9 whereas green dashed line
is the ideal line for uniform distribution. x and y-axis of the plot represent digit value and
corresponding frequency, respectively. Do not forget the legend box. Save your figure as
Histogram.pdf.
As seen, the USA election data is rather di↵erent from expected ideal line. However, looking
only in Fig. 3, we cannot deduce if it is fraudulent election. We will appeal more principle
3
Fall 2016
BBM 103: Introduction to Programming Laboratory I
Figure 2: Comparative vote percentages of nominees
statistical ways.
Step 6: Histogram plot of smaller size samples
This is the repetition of the previous step but with smaller samples randomly generated.
Write a function plotHistogramWithSample taking no input. Create 5 di↵erent-sized (10, 50, 100,
1000, and 10000) lists of random numbers ranging from 0 to 100. For each size, perform
previous step and create histogram plot of the generated random numbers. These plots
should be in di↵erent colors to distinguish them (as shown in Fig. 4). Once you created, you
realize the more sample you use the closer to the ideal line it is. Save each of the figure named
HistogramofSample1.pdf, HistogramofSample2.pdf, …, HistogramofSample5.pdf.
It is not surprising that you obtain histogram plots that show di↵erent frequency, as it is
created with random numbers. Besides, you obtain di↵erent histogram plot every time you
run.
Step 7: Uniformity calculation
As you all realize the closeness of two plot lines increases with an increase in the number of
samples used. But we need computational way to also verify such closeness. As plot lines
are created by list, here we need to calculate the di↵erence/closeness of given two lists. One
common way for calculation is mean squared error (MSE). Write a function calculateMSE taking
two lists to calculate the closeness of them. An illustration of MSE calculation is given below:
>>> calculateMSE([4, 7, 2, 3], [5, 2, 9, 6])
84 ! (4-5)2+(7-2)2+(2-9)2+(3-6)2
4
Fall 2016
BBM 103: Introduction to Programming Laboratory I
Figure 3: Histogram plot
Step 8: MSE calculation of USA election
Once you completed the previous step, now you can calculate MSE values of USA election
data. To do that, write a function that takes a histogram (remember, it is obtained from obtainHistogram) and returns the mean squared error of that histogram with the uniform distribution
represented by green dashed line (ideal line) in Fig. 3. When you invoke calculateMSE function
with an input of histogram data, it should output the MSE value of 0.0023644752018454436,
or approximately 0.002, if it works correctly.
Step 9: Comparison of MSEs
This step is closely related to the following step, to accomplish the next step you need to
compare MSE values of USA to those of 10000 groups of random numbers with same size as
USA election data (204 numbers). Write a function named compareMSEs that takes an argument
of MSE value of USA election histogram calculated in previous step then go to next step.
Step 10: Interpreting results
Once calculated MSEs, it is the turn to interpret the results in this final step. Here, nullhypothesis is our observed sample (the USA election data) is not fraudulent. To prove
election data is fraudulent, we must reject the null hypothesis3. Here, we need p-value of
USA election data which represents the rejection level. Here, you should pay attention to
the %of MSEs that USA election result is greater than those obtained in previous step. To
calculate p-value of USA election data you should divide the number of times that MSE
of USA election data is greater than those obtained in previous step to 10000 which is the
number of groups. If MSE value of USA election is smaller that % 5 of random MSEs (a
3to get an insight on hypothesis testing, read https://onlinecourses.science.psu.edu/statprogram/node/138
5
BBM 103: Introduction to Programming Laboratory I
Figure 4: Histogram plot of random samples
common value of significance level -↵) (which is 500 random MSEs), you can conclude that
null-hypothesis is rejected and confidently claim that the election results are fraudulent, and
vice versa.
You should display the results both on console and in a file named myAnswer.txt. The content of
your program’s outputs should exactly match the following formatting, including capitalization and spacing (except where is replaced by your answers).
Formatting:
MSE value of 2012 USA election is
The number of MSE of random samples which are larger than or equal to USA election MSE is
The number of MSE of random samples which are smaller than USA election MSE is
2012 USA election rejection level p is
Finding: We reject the null hypothesis at the p= level or
Finding: There is no statistical evidence to reject null hypothesis
6
Fall 2016
BBM 103: Introduction to Programming Laboratory I
Notes specified to this assignment
• Avoid redundancy, never repeat yourself!
• The structure of your implementation should be dynamic, do not define static expressions. As your grades will be evaluated on a completely di↵erent dataset, I advise you
to test your implementation over a di↵erent dataset but having same structure. For
this reason, 2008 presidential election results of USA —ElectionUSA2008.csv—are
also provided to you.
• Do not define static path in order that it runs properly on any PC.
• As you cannot display any plot, do not run your work on your own ‘DEV space’.
• Feel free to employ any built-in function.
• Do not attempt to avoid extreme cases (null value i.e.) possibly written in the command
line.
• Be sure your submitted work exactly matches the hierarchy detailed below, as the
submission with the score of 0 will not be considered for evaluation.
• Should you have a question do not hesitate to ask, but first consider oce hours of TA
in charge of this assignment(Selim YILMAZ).
Notes
• Do not miss the deadline.
• Save all your work until the assignment is graded.
• The assignment must be original, individual work. Duplicate or very similar assignments
are both going to be considered as cheating.
• You can ask your questions via Piazza (https://piazza.com/hacettepe.edu.tr/fall2016/bbm101) and
you are supposed to be aware of everything discussed in Piazza.
• You will submit your work from https://submit.cs.hacettepe.edu.tr/index.php with the file hierarchy as below:
7
BBM 103: Introduction to Programming Laboratory I
This file hierarchy must be zipped before submitted (Not .rar, only .zip files are supported
by the system)
!
! assignment4.py
! retrievedData.txt
! ComparativeVotes.pdf
! CompVotePercs.pdf
! Histogram.pdf
! HistogramofSample1.pdf
! HistogramofSample2.pdf
! HistogramofSample3.pdf
! HistogramofSample4.pdf
! HistogramofSample5.pdf
! myAnswer.txt
8