## Description

In a society where freedom and democracy reign, people have right to return their ruling class.

Election is a formal way of accepting or rejecting a political proposition by voting. People,

particularly living in third-world countries, worry about their votes and question whether the

election is trustful. However, thanks to the statistical tools we can easily find out any fraud

attempt.

In this assignment, you will implement a Python program that analyzes the results of the USA

presidential election held in 2012 and interprets whether it is fraudulent or not. It was election

with 4 major parties (Democratic, Republican, Libertarian, and Green) and 30 nominees most

of whom were write-in candidates. Democrat nominee Obama B., republican nominee Romney

M., libertarian nominee Jonhson G., and green nominee Stein J. participated in the election

resulting with the victory of the democrats. You are provided with election results in a file

named ElectionUSA2012.csv1. This file records the number of votes state by state. There

are eight di↵erent information in one row: State name, total votes, electoral votes, total vote,

# of votes for Obama, Romney, Johnson, Stein, and others. Each row represents a state in

the USA. To summarize, there are 204 election results (exclude the votes for “others”) you

care about to reveal fraudulence, if any.

You will look closely at the least significant digits of votes (ones and tens place in this

assignment) which are essentially noise and do not a↵ect who wins. The idea is that, in any

real election, we expect the ones and tens to be uniformly distributed; namely 10% of the

digits is 0, 10% of the digits is 1, and so forth. If distribution is not uniform, then it is likely

1obtained from http://www.fec.gov/pubrec/fe2012/federalelections2012.shtml

1

BBM 103: Introduction to Programming Laboratory I

made-up by someone rather than collected from ballot-boxes. To accomplish this assignment,

there are some steps you need to carry out.

Step 1: Read election data

The content of ElectionUSA2012.csv file is explained in previous section. Each information in a row is separated with a delimiter (,). To read the file write a function called retrieveData

that takes two inputs one of which represents the filename and the other is a list consisting

of the nominees’ names. It returns a one-dimensional list that contains the vote counts from

every row in a successive manner. Save the output under a file named retrievedData.txt

>>> retrieveData(“ElectionUSA2012.csv”, [“Obama”,“Romney”,“Johnson”,“Stein”])

[795696,122640,1025232,394409,7854285, ,….,20928,4406,7665,0]

Note 1: The arguments of the function defined (filename and a list of nominees’ names)

should be dynamic and take their values from command-line arguments. To invoke retrieveData

with the parameter values of those illustrated above, it is necessary to call your program as

shown below2:

$ python Assignment4.py ElectionUSA2012.csv Obama,Romney,Johnson,Stein

Note 2: A change in the order of nominees’ names (2nd system argument) should change

the output.

Step 2: Bar plot of vote counts

Once you obtain all vote counts, plot a bar figure in order to visualize the vote distribution of

two nominees who dominated the election (Obama and Romney for 2012 USA election) as a

function of state name. To do that, write a function DispBarPlot that takes no input and returns

none. The figure should look exactly same as given in Fig. 1. In the figure, x-axis represents

the states whereas y-axis represents vote counts each nominee took. Vote counts should be

represented with blue and red bars. Do not forget to create legend box without which nothing

would be interpreted. Save your first plot in a file named ComparativeVotes.pdf.

Note: Your implementation should output same plot every time you run with varying order

in nominees’ names. Likewise, your plot should not be a↵ected from a change in the order of

the header of the file provided.

Step 3: Bar plot for vote comparison

In order to reveal the margins between the votes given to each nominee, you are expected

to visualize the comparative vote percentages of all nominees. Write a function compareVoteonBar

that creates a figure window containing bar plot that should look exactly same as provided

in Fig 2. In this plot, vote percentages should be given as x-labels and nominees’ names

should be provided in a legend box. As all you realize, there is no vote percentage present in

election data file. Therefore, you need to obtain vote percentages first and visualize them in

2check your interpreter version by typing python version and be sure it is 3.x

2

BBM 103: Introduction to Programming Laboratory I

Figure 1: Comparative demonstration with bar plot

a descending order (as in Fig 2). Save your plot in a file named CompVotePercs.pdf. Consider the

note given in the previous step.

Step 4: Obtain histogram

As opposed to the previous step, now you care about ones and tens digits of votes to get the

frequencies of them. To do that, write a function named obtainHistogram that takes a list as input

and produces an output as a list of 10 numbers. Each element of output list represents the

frequency of digit appeared in ones and tens place in input. Note that it is 0 for numbers less

than 9.

>>> obtainHistogram([7, 24, 25, 180, 249, 326, 446, 446, 512, 552, 612, 618, 618, 714, 780, 839, 846, 890, 949, 951])

[0.1, 0.15, 0.15, 0.025, 0.175, 0.075, 0.1, 0.025, 0.1, 0.1]

Step 5: Histogram plot

To complete this step, you need to get frequency list calculated in obtainHistogram. Create a

function named plotHistogram that takes a histogram and plots the frequencies of ones and tens

digits for the 2012 USA election data. Your histogram plot should look exactly same as

provided in Fig. 3. In this figure you see two plot lines with di↵erent colors. The red straight

line is frequency distribution of the numbers ranging from 0 to 9 whereas green dashed line

is the ideal line for uniform distribution. x and y-axis of the plot represent digit value and

corresponding frequency, respectively. Do not forget the legend box. Save your figure as

Histogram.pdf.

As seen, the USA election data is rather di↵erent from expected ideal line. However, looking

only in Fig. 3, we cannot deduce if it is fraudulent election. We will appeal more principle

3

Fall 2016

BBM 103: Introduction to Programming Laboratory I

Figure 2: Comparative vote percentages of nominees

statistical ways.

Step 6: Histogram plot of smaller size samples

This is the repetition of the previous step but with smaller samples randomly generated.

Write a function plotHistogramWithSample taking no input. Create 5 di↵erent-sized (10, 50, 100,

1000, and 10000) lists of random numbers ranging from 0 to 100. For each size, perform

previous step and create histogram plot of the generated random numbers. These plots

should be in di↵erent colors to distinguish them (as shown in Fig. 4). Once you created, you

realize the more sample you use the closer to the ideal line it is. Save each of the figure named

HistogramofSample1.pdf, HistogramofSample2.pdf, …, HistogramofSample5.pdf.

It is not surprising that you obtain histogram plots that show di↵erent frequency, as it is

created with random numbers. Besides, you obtain di↵erent histogram plot every time you

run.

Step 7: Uniformity calculation

As you all realize the closeness of two plot lines increases with an increase in the number of

samples used. But we need computational way to also verify such closeness. As plot lines

are created by list, here we need to calculate the di↵erence/closeness of given two lists. One

common way for calculation is mean squared error (MSE). Write a function calculateMSE taking

two lists to calculate the closeness of them. An illustration of MSE calculation is given below:

>>> calculateMSE([4, 7, 2, 3], [5, 2, 9, 6])

84 ! (4-5)2+(7-2)2+(2-9)2+(3-6)2

4

Fall 2016

BBM 103: Introduction to Programming Laboratory I

Figure 3: Histogram plot

Step 8: MSE calculation of USA election

Once you completed the previous step, now you can calculate MSE values of USA election

data. To do that, write a function that takes a histogram (remember, it is obtained from obtainHistogram) and returns the mean squared error of that histogram with the uniform distribution

represented by green dashed line (ideal line) in Fig. 3. When you invoke calculateMSE function

with an input of histogram data, it should output the MSE value of 0.0023644752018454436,

or approximately 0.002, if it works correctly.

Step 9: Comparison of MSEs

This step is closely related to the following step, to accomplish the next step you need to

compare MSE values of USA to those of 10000 groups of random numbers with same size as

USA election data (204 numbers). Write a function named compareMSEs that takes an argument

of MSE value of USA election histogram calculated in previous step then go to next step.

Step 10: Interpreting results

Once calculated MSEs, it is the turn to interpret the results in this final step. Here, nullhypothesis is our observed sample (the USA election data) is not fraudulent. To prove

election data is fraudulent, we must reject the null hypothesis3. Here, we need p-value of

USA election data which represents the rejection level. Here, you should pay attention to

the %of MSEs that USA election result is greater than those obtained in previous step. To

calculate p-value of USA election data you should divide the number of times that MSE

of USA election data is greater than those obtained in previous step to 10000 which is the

number of groups. If MSE value of USA election is smaller that % 5 of random MSEs (a

3to get an insight on hypothesis testing, read https://onlinecourses.science.psu.edu/statprogram/node/138

5

BBM 103: Introduction to Programming Laboratory I

Figure 4: Histogram plot of random samples

common value of significance level -↵) (which is 500 random MSEs), you can conclude that

null-hypothesis is rejected and confidently claim that the election results are fraudulent, and

vice versa.

You should display the results both on console and in a file named myAnswer.txt. The content of

your program’s outputs should exactly match the following formatting, including capitalization and spacing (except where is replaced by your answers).

Formatting:

MSE value of 2012 USA election is

The number of MSE of random samples which are larger than or equal to USA election MSE is

The number of MSE of random samples which are smaller than USA election MSE is

2012 USA election rejection level p is

Finding: We reject the null hypothesis at the p= level or

Finding: There is no statistical evidence to reject null hypothesis

6

Fall 2016

BBM 103: Introduction to Programming Laboratory I

Notes specified to this assignment

• Avoid redundancy, never repeat yourself!

• The structure of your implementation should be dynamic, do not define static expressions. As your grades will be evaluated on a completely di↵erent dataset, I advise you

to test your implementation over a di↵erent dataset but having same structure. For

this reason, 2008 presidential election results of USA —ElectionUSA2008.csv—are

also provided to you.

• Do not define static path in order that it runs properly on any PC.

• As you cannot display any plot, do not run your work on your own ‘DEV space’.

• Feel free to employ any built-in function.

• Do not attempt to avoid extreme cases (null value i.e.) possibly written in the command

line.

• Be sure your submitted work exactly matches the hierarchy detailed below, as the

submission with the score of 0 will not be considered for evaluation.

• Should you have a question do not hesitate to ask, but first consider oce hours of TA

in charge of this assignment(Selim YILMAZ).

Notes

• Do not miss the deadline.

• Save all your work until the assignment is graded.

• The assignment must be original, individual work. Duplicate or very similar assignments

are both going to be considered as cheating.

• You can ask your questions via Piazza (https://piazza.com/hacettepe.edu.tr/fall2016/bbm101) and

you are supposed to be aware of everything discussed in Piazza.

• You will submit your work from https://submit.cs.hacettepe.edu.tr/index.php with the file hierarchy as below:

7

BBM 103: Introduction to Programming Laboratory I

This file hierarchy must be zipped before submitted (Not .rar, only .zip files are supported

by the system)

!

! assignment4.py

! retrievedData.txt

! ComparativeVotes.pdf

! CompVotePercs.pdf

! Histogram.pdf

! HistogramofSample1.pdf

! HistogramofSample2.pdf

! HistogramofSample3.pdf

! HistogramofSample4.pdf

! HistogramofSample5.pdf

! myAnswer.txt

8