Description
Part 1) 10 points
Use the primes (UsingR) dataset. Use the diff function to compute the
differences between successive primes. Show the frequencies of these
differences. Show the barplot of these differences.
Part 2) 10 points
Use the coins (UsingR) dataset. Do not use explicit loops for any
calculations. Do not hard code the denominations in the solution. The
solution should work for any denominations.
a) How many coins are there of each denomination?
b) What is the total value of the coins for each denomination?
c) What is the total value of all the coins?
d) Show the barplot for the number of coins by year.
Part 3) 10 points
Use the south (UsingR) dataset.
a) Show the stem plot of the data. What do you interpret from this plot?
b) Show the five number summary of the data. Calculate the lower and
upper ends of the outlier ranges. What are the outliers in the data?
c) Show the horizontal boxplot of the data along with the appropriate labels
on the plot.
Part 4) 10 points
Use the pi2000 (UsingR) dataset.
a) How many times each of the digits 0 to 9 occur in this dataset?
b) Show the percentages of their frequencies.
c) Show the histogram of the data.
Part 5) 15 points
Suppose that a football (NFL), basketball (NBA), and hockey (NHL)
games are being shown at the same time. Consider the two-way
summarized data shown below showing the preferences of men and
women what sport they wish to watch.
a) Using cbind, create the matrix for the above data.
b) Set the row names for the data.
c) Set the column names for the data.
d) Now, add the dimension variables Gender and Sport to the data.
e) Show the marginal distributions for the Gender and the Sport.
f) Show the result of adding margins to the data.
g) Show the proportional data separately for Gender and Sport. Interpret
the results.
h) Using appropriate colors, show the mosaic plot for the data. Also show
the barplot for Gender and Sport separately with the bars side by side. Add
legend to the plots.
Part 6) 10 points
Use the midsize (UsingR) dataset.
a) Show the pair wise plots for all the variables.
b) Provide at least 4 interpretations of the results.
Part 7) 15 points
Use the MLBattend (UsingR) dataset.
a) Extract the wins for the teams BAL, BOS, DET, LA, PHI into the
respective vectors.
b) Create a data frame of five columns using these vectors. Use the team
names for the columns
c) Show the boxplot of the data frame.
d) Provide at least 5 interpretations of the results.
Part 8) 20 points
Initialize the House and Senate data as shown below:
house <- read.csv(‘http://kalathur.com/house.csv’, stringsAsFactors = FALSE)
senate <- read.csv(‘http://kalathur.com/senate.csv’, stringsAsFactors = FALSE)
Provide the simplest R code for the following:
a) Show how many senators and house members are there by party lines?
b) Show the top 10 states in decreasing order by the number of house
members in that state?
c) Use a box plot on the number of house members per state and
determine which states are outliers?
d) What is the average number of years served by party line in the house
and senate?
Submission:
Create a folder, CS544_HW3_lastName and place the following files
in this folder.
Provide the R code, HW3_lastName.R, with each portion of the code
clearly identified by the corresponding question. Prepare a corresponding
word document by pasting the output for each question
(HW3_lastName.docx)
Archive the folder (CS544_HW3_lastName.zip). Upload the zip file to
the Assignments section of Blackboard.