MATH 4753 Laboratory 7 Sampling Distributions solved

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (1 vote)

In this lab we will investigate the idea of a sampling distribution. Most of the sampling will
be done from a normal population.

The procedure is as follows:
1. Sample from a Normal distribution using rnorm().
2. Create a statistic (i.e a function of the data).
3. Store the statistic.

4. Repeat the procedure for a designated number of iterations.
5. When finished create a histogram of the statistic.

The method for doing this will be to use a ready-made R script, adapt it and re-run it for the
problems given below. This process will be very instructive and should help you to not only
perform statistics but also give you the basis for much distributional theory.

The lab is in two parts
1. One population sampling
2. Two population sampling

Objectives
In this lab you will learn how to:
1. Create a sample from one population.
2. Create statistics.

3. Create sampling distributions and appropriate graphs.
4. Sample from two different populations and create sampling distributions for statistics made from
both samples.
5. Add data and its documentation to your R package

Tasks

Use RMD and knit into HTML. Place both completed documents on CANVAS before due date with the
.fbr or .mov movie file (see task 3).
Warning:
If you want the functions to produce plots suitable for RMD then they must not contain code that
precedes the plot function with a windows() or any other function that makes a new graphical
window.

Note: All plots you are asked to make should be created using Rmd
by using R chunks.

• Task 1
o Make a folder LAB7
o Download the file “lab7.r”
o Place this file with the others in LAB7.
o Start Rstudio
o Open “lab7.r” from within Rstudio.
o Go to the “session” menu within Rstudio and “set working directory” to where the source
files are located.
o Issue the function getwd().

• Task 2
o Make a new file for your code in RStudio editor, call it “mylab7.R” and place in it all the
code you need to answer the tasks of this lab (copy and paste from lab7.R).
o Use the hash # symbol and write your own comments in the code file explaining what the
code does.

o The first statistic we will make is the Chi-square statistic. This is created by the following
formula 𝜒
2 =
(𝑛−1)𝑠
2
𝜎2
, where 𝑠
2
is the sample variance and 𝜎
2
is the population variance,
where the population is Normal, 𝑌 ∼ 𝑁(𝜇, 𝜎
2
), and 𝑛 is the sample size.

o The function you will use us called mychisim()
Population
𝑌 ∼ 𝑁(𝜇, 𝜎)
𝑦1, 𝑦2, … , 𝑦𝑛
𝑦1, 𝑦2, … , 𝑦𝑛
𝑦1, 𝑦2, … , 𝑦𝑛
n
𝑦1, 𝑦2, … , 𝑦𝑛

o Make four plots according to the following options (the function will require you to click
into the graph to complete its operation) – you may need to adjust ymax.
▪ 𝑛1 = 10, 𝑖𝑡𝑒𝑟 = 1000, 𝜇1 = 10, 𝜎1 = 4
▪ 𝑛1 = 20, 𝑖𝑡𝑒𝑟 = 1000, 𝜇1 = 10, 𝜎1 = 4
▪ 𝑛1 = 100, 𝑖𝑡𝑒𝑟 = 1000, 𝜇1 = 10, 𝜎1 = 4
▪ 𝑛1 = 200, 𝑖𝑡𝑒𝑟 = 1000, 𝜇1 = 10, 𝜎1 = 4
o The function returns a list of statistics, the statistic we are interested in is the 𝜒
2value for
each iteration.

These values are in the vector called 𝑤. Invoke the function with 𝑛1 =
10, 𝑖𝑡𝑒𝑟 = 1500, 𝜇1 = 20, 𝜎1 = 10 and place the output into an object called chisq.
Make a histogram of chisq$w.

• Task 3
o myTsim() function is available for you to use.(See the mytsim.R file on CANVAS)
o The statistic it creates is 𝑇 =
𝑦̅−𝜇
𝑠
√𝑛
, this is created using the functions, mean() and sd().

o Once you have made the function make some simulations as before (make sure you
have all the code ready to repeat at the end) – that is:
▪ A) Make four plots according to the following options (the function will
require you to click into the graph to complete its operation) – you may need
to adjust ymax.
• 𝒏𝟏 = 𝟏𝟎,𝒊𝒕𝒆𝒓 = 𝟏𝟎𝟎𝟎,𝝁𝟏 = 𝟏𝟎, 𝝈𝟏 = 𝟒
• 𝒏𝟏 = 𝟐𝟎,𝒊𝒕𝒆𝒓 = 𝟏𝟎𝟎𝟎,𝝁𝟏 = 𝟏𝟎, 𝝈𝟏 = 𝟒
• 𝒏𝟏 = 𝟏𝟎𝟎,𝒊𝒕𝒆𝒓 = 𝟏𝟎𝟎𝟎, 𝝁𝟏 = 𝟏𝟎, 𝝈𝟏 = 𝟒
• 𝒏𝟏 = 𝟐𝟎𝟎,𝒊𝒕𝒆𝒓 = 𝟏𝟎𝟎𝟎, 𝝁𝟏 = 𝟏𝟎, 𝝈𝟏 = 𝟒

▪ B) The function returns a list of statistics, the statistic we are interested in is
the 𝑻 value for each iteration. These values are in the vector called 𝒘.
Invoke the function with 𝒏𝟏 = 𝟏𝟎,𝒊𝒕𝒆𝒓 = 𝟏𝟓𝟎𝟎,𝝁𝟏 = 𝟐𝟎, 𝝈𝟏 = 𝟏𝟎 and place
the output into an object called T.

Make a histogram of T$w.
o Record all plots here.
o Now start up BBFLASHBACK recorder and record the re-making of the plots made
above in A) and B) by re-issuing the code you made, give a brief dialog as you record.
Place the .fbr file into CANVAS Lab 7 dropbox.

• Task 4
o You will now make simulations from two populations and use the samples to make a
statistc.
o The first statistic is the two sample chisquare statistic. The function is called mychsim2().
o The statistic is 𝜒
2 =
(𝑛1+𝑛2−2)𝑆𝑝
2
𝜎2
, where we assume that both populations have the same
variance 𝜎
2
. 𝑆𝑝
2 =
(𝑛1−1)𝑆1
2+(𝑛2−1)𝑆2
2
𝑛1+𝑛2−2
, where 𝑆𝑖
2
is the sample variance from population 𝑖,
𝑛𝑖
is the sample size and 𝑆𝑝
2
is the pooled sample variance.

o Use mychisim2() to sample from two normal populations with the following parameters:
▪ 𝑛1 = 10, 𝑛2 = 10, 𝜇1 = 5, 𝜇2 = 10, 𝜎1 = 𝜎2 = 4, 𝑖𝑡𝑒𝑟 = 1000
▪ 𝑛1 = 20, 𝑛2 = 10, 𝜇1 = 3, 𝜇2 = 5, 𝜎1 = 𝜎2 = 10,𝑖𝑡𝑒𝑟 = 1000
▪ 𝑛1 = 50, 𝑛2 = 50, 𝜇1 = 5, 𝜇2 = 10, 𝜎1 = 𝜎2 = 4, 𝑖𝑡𝑒𝑟 = 10000
▪ 𝑛1 = 80, 𝑛2 = 50, 𝜇1 = 3, 𝜇2 = 5, 𝜎1 = 𝜎2 = 10,𝑖𝑡𝑒𝑟 = 10000
o Use default values in the function with 𝑖𝑡𝑒𝑟 = 10000 and use the output to make a
histogram as before.

• Task 5
o Alter the function myTsim2() to place the legend where you click with the mouse.
o From the table taken from the book (MS page 278) and reproduced below write down the
student’s T statistic the function calculates, explain the notation.

o Copy and paste from the code the part that calculates the statistic.
o Use myTsim2() to sample from two normal populations with the following parameters:
▪ 𝑛1 = 10, 𝑛2 = 10, 𝜇1 = 5, 𝜇2 = 10, 𝜎1 = 𝜎2 = 4, 𝑖𝑡𝑒𝑟 = 1000
▪ 𝑛1 = 20, 𝑛2 = 10, 𝜇1 = 3, 𝜇2 = 5, 𝜎1 = 𝜎2 = 10,𝑖𝑡𝑒𝑟 = 1000
▪ 𝑛1 = 50, 𝑛2 = 50, 𝜇1 = 5, 𝜇2 = 10, 𝜎1 = 𝜎2 = 4, 𝑖𝑡𝑒𝑟 = 10000
▪ 𝑛1 = 80, 𝑛2 = 50, 𝜇1 = 3, 𝜇2 = 5, 𝜎1 = 𝜎2 = 10,𝑖𝑡𝑒𝑟 = 10000
o Use default values in the function with 𝑖𝑡𝑒𝑟 = 10000 and use the output to make a
histogram as before.

• Task 6
o Now use myFsim2() to create F statistics from two normal populations.
o Use the table below to write down the statistic that the function will calculate.
o What assumptions are made?
o Make four plots with different parameters.
o Make a histogram from the function using default values.

• Task 7
o We have been adding functions to our package, this week we will add data
o See http://r-pkgs.had.co.nz/data.html for more information on this.
o In RStudio open your package project ILAS2019
o Read in the data set FIREDAM.csv – you may choose to use
fire=read.csv(“FIREDAM.csv”)

o Save this to the “data” directory as an rda file – install the package usethis you can
run
usethis::use_data(fire)
o Look at the ddt example data – in the R folder – the name “ddt” is where a function would
normally be defined so this is what you are documenting – the data is called “ddt” – the
data will be created for for R as “ddt.rda”

o Using the ddt example go ahead and make the documentation for “fire”
o Now build and install the package
o In RMD make an R chunk and do the following
▪ library(ILAS2019)
▪ data(“fire”)
▪ knitr::kable(head(fire))
################### LAB FINISHES HERE ###############################

• Task 8 – Extra for experts
o Make a function that uses 𝑤 to create confidence intervals – hint: you will need quantile()