Description
In this lab you will learn the basics of R. This program is free and you are encouraged to
obtain a copy for your Mac, PC or Linux machine. Install it and then download and install
R studio (this is a nice front end to R and is also free).
Objectives
In this lab you will learn how to:
1. Use the empirical rule
2. Use the Chebyshev rule
3. Transform data to z values
4. Find outliers using z values
Tasks
There are many IDE’s (integrated development environment) we could use in conjunction with R, however
we will use Rstudio for all of the course.
All output must be made by using an RMD document knit into an html file. Save and place in the dropbox
when completed.
• Task 1
o Download from CANVAS the zipped data files, “Dataxls”
o Unzip the contents into a directory on your desktop (call it LAB2)
o Download the file “lab2.r”
o Place this file with the others in LAB2.
o Start Rstudio
o Open “lab2.r” from within Rstudio.
o Go to the “session” menu within Rstudio and “set working directory” to where the source
files are located.
o Obtain the working directory by issuing the command getwd():
• Task 2
o Find the file “EPAGAS.xls” inside LAB2
o Open it in Excel
o Save As type CSV(comma delimited) “*.csv”
o Use read.table(), read.csv() or the Rstudio menu to read the data into
R (or any other method available), this function will already be available within the script
lab2.r which you have opened in Rstudio.
o Obtain the first six lines of the data using “head()” :
• Task 3
o Make the object mpg, the number of miles per gallon vector.
o If 𝑧𝑖 =
(𝑥𝑖−𝑥̅)
𝑠𝑥
, then 𝑧̅= 0 and 𝑠𝑧
2 = 1. Transform the mpg variable to z and verify these
results.
o Using z, find the values of mpg that are possible outliers.
o Using z, find the values of mpg that defined as outliers.
o Using the lattice package construct a dotplot with colors, Red=outlier, Blue=possible
outlier. (NB – read the instructions in the lab2.r file for installing the package)
• Task 4
o Make a boxplot of the mpg variable
▪ Make the box black
▪ Put a notch where the median goes
▪ Put a title on the graph.
▪ Make the plot horizontal.
o Using Chebyshev’s theorem predict the proportion of data within 2 standard deviations of
the mean of the mpg data.
o Use R to calculate the exact proportion within 2 standard deviation of the mean.
o Does Chebyshev agree with the data?
o Now use the empirical rule, what proportion (according to the rule) of the data should be
within 2 standard deviations of the mean?
o How well does it correspond?
o Is the Empirical rule valid in this case? Why?