Description
In this lab we will learn about two types of estimate:
1. Point
2. Interval
These estimates are important and will be studied in detail as we progress through chapter
7 of MS. We will use a simulation technique called the bootstrap. This gets its name from the
idea that we derive everything from what we have at hand β that is we pull ourselves up from our
bootstraps.
Specifically this means that we take one sample from the population of interest and
assume that the sample is representative, that is, it contains information that is summative of the
population parameters. We will use the sample by resampling from it.
Sample
π¦1, π¦2, β¦ , π¦π
π¦1, π¦2, β¦ , π¦π
π¦1, π¦2, β¦ , π¦π
n
π¦1, π¦2, β¦ , π¦π
π¦Μ
1
π¦Μ
2
π¦Μ
3
π¦Μ
ππ‘ππ
POP
n
The R function sample() will be used to do the resampling. If the sample is of size n then the bootstrap
re-sampling will create samples of size n with replacement. The following is a function that will facilitate
the bootstrap procedure.
myboot<-function(iter=10000,x,fun=”mean”,alpha=0.05,…){
#Notice where the … is repeated in the code
n=length(x) #sample size
#Now sample with replacement
y=sample(x,n*iter,replace=TRUE) #A
# Make a matrix with all the resampled values
rs.mat=matrix(y,nr=n,nc=iter,byrow=TRUE)
xstat=apply(rs.mat,2,fun)
# xstat is a vector and will have iter values in it
ci=quantile(xstat,c(alpha/2,1-alpha/2)) #B
# Nice way to form a confidence interval
# A histogram follows
# The object para will contain the parameters used to make the
histogram
para=hist(xstat,freq=FALSE,las=1,main=”Histogram of Bootstrap sample
statistics”,…)
#mat will be a matrix that contains the data, this is done so that I
can use apply()
mat=matrix(x,nr=length(x),nc=1,byrow=TRUE)
#pte is the point estimate
#This uses whatever fun is
pte=apply(mat,2,fun)
abline(v=pte,lwd=3,col=”Black”)# Vertical line
segments(ci[1],0,ci[2],0,lwd=4) #Make the segment for the ci
text(ci[1],0,paste(“(“,round(ci[1],2),sep=””),col=”Red”,cex=3)
text(ci[2],0,paste(round(ci[2],2),”)”,sep=””),col=”Red”,cex=3)
# plot the point estimate 1/2 way up the density
text(pte,max(para$density)/2,round(pte,2),cex=3)
return(list(ci=ci,fun=fun,x=x))# Some output to use if necessary
}
The above code will be used to make a number of plots, point and interval estimates and useful output in
the form of a list.
Tasks
All output should be made through RMD. Make sure you knit to html and upload to CANVAS
Note: All plots you are asked to make should be recorded through
RMD.
β’ Task 1
o Make a folder LAB9
o Download the file βlab9.rβ
o Place this file with the others in LAB9.
o Start Rstudio
o Open βlab9.rβ from within Rstudio.
o Go to the βsessionβ menu within Rstudio and βset working directoryβ to where the source
files are located.
o Issue the function getwd()
β’ Task 2
o Create your own R file and record the R code you used to complete the lab.
o In the above code for myboot()there are two lines marked A and B. Using any
resources available explain what each line does
βͺ Line A
βͺ Line B
o The sample() function should be studied a little further. As used in the myboot() function,
each datum in x will be selected with equal probability. Why is this necessary?
o Issue the following lines in R and record the unique sample you get
βͺ set.seed(35) # This will give everyone the same sample
βͺ sam=round(rnorm(20,mean=10,sd=4),2)
Histogram of Bootstrap sample statistics
alpha=0.05 iter=10000
mean(x)/median(x)
Density
0.90 0.95 1.00 1.05 1.10
0
2
4
6
8
10
12
(0.95 1.07)
1.03
βͺ unique(sample(sam,20,replace=TRUE) ) # repeat this
line 5Xs
βͺ Explain what you see.
βͺ unique(sample(sam,20,replace=FALSE) ) # repeat this
line 5Xs
βͺ Explain what you see.
o Issue sample(sam,21,replace=FALSE) what happens? Why?
β’ Task 3
o Using myboot() make 95% (alpha=0.05) bootstrap intervals (iter=10000) for the
population mean π and record the plots when the following samples are used:
βͺ A) set.seed(39); sam=rnorm(25,mean=25,sd=10)
βͺ B) set.seed(30); sam=rchisq(20,df=3)
βͺ C) set.seed(40); sam=rgamma(30,shape=2,scale=3)
βͺ D) set.seed(10); sam=rbeta(20,shape1=3,shape2=4)
o In each of the above cases how close is the point estimate to the population value? HINT:
You will need to calculate the population mean.
o In each of the above cases does the interval contain the population value? (HINT: You
will need to calculate the population mean. See MS chapter 5 or use Wikipedia)
o Using myboot() make 80% (alpha=0.20) bootstrap intervals (iter=10000) for the
population variance π
and record the plots when the following samples are used:
βͺ A) set.seed(39); sam=rnorm(25,mean=25,sd=10)
βͺ B) set.seed(30); sam=rchisq(20,df=3)
βͺ C) set.seed(40); sam=rgamma(30,shape=2,scale=3)
βͺ D) set.seed(10); sam=rbeta(20,shape1=3,shape2=4)
β’ Task 4
o Adjust myboot() so that it returns as a part of the list the vector containing the statistic
(xstat) of interest along with all the other existing quantities.
o Using the adjusted myboot() function call myboot(x=sam,fun=“medianβ), where
sam=c(1,1,1,2,2,2,2,3,3,3,4,4) – make a barplot of xstat.
o What is the bootstrap interval estimate for the median? (L,U)
β’ Task 5
o Find a 95% interval estimate for the population mean/median (mean divided by median)
using each of the following samples
βͺ A) set.seed(39); sam=rnorm(25,mean=25,sd=10)
βͺ B) set.seed(30); sam=rchisq(20,df=3)
βͺ C) set.seed(40); sam=rgamma(30,shape=2,scale=3)
βͺ D) set.seed(10); sam=rbeta(20,shape1=3,shape2=4)
o Do the same except make 70% intervals.
β’ Task 6
o Issue the command ?distributions in R
o Choose four distributions that we havenβt used this far and make one sample of size 20
from each.
o Make 80% bootstrap intervals for the mean and variance using each sample. Use
iter=10000.
o Record the graphical output.
β’ Task 7
o Use myboot()and create bootstrap intervals using two statistics that you find
interesting. Use set.seed(68); sam=rnorm(20,mean=10,sd=4) as the
sample. (You might try some functions like quantile, IQR, sd, var, mean,
median)
o Theory demands that if π βΌ π(π, π),π‘βππ π (πΜ
β π§πΌ
2
π
βπ
β€ π β€ πΜ
+ π§πΌ
2
π
βπ
) = 1 β πΌ .
This defines π¦Μ
β π§πΌ
2
π
βπ
β€ π β€ π¦Μ
+ π§πΌ
2
π
βπ
to be the (1 β πΌ)100% confidence interval,
where π§πΌ
2
=qnorm(1-alpha/2,mean=0,sd=1), that is the 1 β πΌ/2 standard
normal quantile.
o Calculate the 95% confidence interval using the above theory using the sample
set.seed(68); sam=rnorm(20,mean=10,sd=4)
o Use myboot() with the same sample to find a 95% bootstrap interval using the same
sample. (Decide on the options you need).
o How do they compare?
β’ Task 8
o Add myboot2() to your package (make sure it is well documented)
o Use whatever data you have installed in your package to create a bootstrap ci
obj=ILAS2019::myboot2(x=ddt$DDT)
################### LAB FINISHES HERE ###############################
β’ Task 9 Extra for experts!
o Rewrite myboot() so that it creates interesting plots