Description

5/5 - (3 votes)

Background: The Normal Distribution
Recall from your probability class that a random variable X is normally-distributed with mean µ and variance
σ
2
(denoted X ∼ N(µ, σ2
)) if it has a probability density function, or pdf, equal to
f(x) = 1
√
2πσ2
e
−
(x−µ)
2
2σ2 .
In R we can simulate N(µ, σ2
) random variables using the rnorm() function. For example,
rnorm(n = 5, mean = 10, sd = 3)
## [1] 8.120639 10.550930 7.493114 14.785842 10.988523
outputs 5 normally-distributed random variables with mean equal to 10 and standard deviation (this is σ)
equal to 3. If the second and third arguments are ommited the default rates are mean = 0 and sd = 1,
which is referred to as the “standard normal distribution”.
Tasks
Sample means as sample size increases
1) Generate 100 random draws from the standard normal distribution and save them in a vector named
normal100. Calculate the mean and standard deviation of normal100. In words explain why these
values aren’t exactly equal to 0 and 1.
# You’ll want to type your response here. Your response should look like:
# normal100 <-
# Of course, your answer should not be commented out.
1
2) The function hist() is a base R graphing function that plots a histogram of its input. Use hist() with
your vector of standard normal random variables from question (1) to produce a histogram of the
standard normal distribution. Remember that typing ?hist in your console will provide help documents
for the hist() function. If coded properly, these plots will be automatically embedded in your output
file.
3) Repeat question (1) except change the number of draws to 10, 1000, 10,000, and 100,000 storing the
results in vectors called normal10, normal1000, normal10000, normal100000.
4) We want to compare the means of our four random draws. Create a vector called sample_means
that has as its first element the mean of normal10, its second element the mean of normal100, its
third element the mean of normal1000, its fourth element the mean of normal10000, and its fifth
element the mean of normal100000. After you have created the sample_means vector, print the
contents of the vector and use the length() function to find the length of this vector. (it should be
five). There are, of course, multiple ways to create this vector. Finally, explain in words the pattern we
are seeing with the means in the sample_means vector.
Sample distribution of the sample mean
5) Let’s push this a little farther. Generate 1 million random draws from a normal distribution with µ = 3
and σ
2 = 4 and save them in a vector named normal1mil. Calculate the mean and standard deviation
of normal1mil.
6) Find the mean of all the entries in normal1mil that are greater than 3. You may want to generate a
new vector first which identifies the elements that fit the criteria.
7) Create a matrix normal1mil_mat from the vector normal1mil that has 10,000 columns (and
therefore should have 100 rows).
8) Calculate the mean of the 1234th column.
9) Use the colSums() functions to calculate the means of each column of normal1mil_mat. Remember,
?colSums will give you help documents about this function. Save the vector of column means with an
appropriate name as it will be used in the next task.
10) Finally, produce a histogram of the column means you calculated in task (9). What is the distribution
that this histogram approximates (i.e. what is the distribution of the sample mean in this case)?
11) Let’s push this even farther. Generate 10 million random draws from an exponential distribution
with rate parameter λ = 3 (Hint: ?rexp). Save the simulated draws in a vector named exp_10mil.
Calculate the mean and standard deviation of exp_10mil. How do these numbers compare to
E(X) = 1/3 and sd(X) = 1/3?
12) Create a matrix exp10mil_mat from the vector exp_10mil that has 10,000 columns (and therefore
should have 100 rows). Use the colMeans() function to calculate the means of each column of
exp_mil_mat. Show the first 10 computed means.
13) Finally, produce a histogram of the column means you calculated in task (12). What is the approximate
distribution that this histogram displays (i.e. what is the distribution of the sample mean in this case)?
Overlay the true approximate density function over the histogram. Note: the correct code is displayed
below.
# hist(exp_means,
# main=”Histogram of Exponential Means”,
# xlab=expression(bar(X)),
# prob = T,
# breaks=20)
# n <- nrow(exp10mil_mat)
2
# mean_exp <- 1/3
# mean_exp
# sd_exp <- 1/(3*sqrt(n))
# sd_exp
# x <- seq(0,1,by=.0001)
# my_density <- dnorm(x,mean=mean_exp,sd=sd_exp)
# lines(x,my_density,col=”purple”)
3

STAT GU4206/GR5206 Lab 1

Description

Related products

STAT GU4206/GR5206 Lab 3 Practice Lab Base R Graphics

STAT GU4206/GR5206 Lab 4

EECE 144 Lab 1