~~$35.00~~ $21.00

Category: Stats 12

Description

5/5 - (5 votes)

Objectives

1. Learn how to conduct hypothesis testing for one proportion

2. Learn how to conduct hypothesis testing for two proportions

3. Learn how to conduct hypothesis testing for one mean

4. Interpreting hypothesis testing for linear regression

Collaboration Policy

In Lab you are encouraged to work in pairs or small groups to discuss the concepts on the

assignments. However, DO NOT copy each other’s work as this constitutes cheating. The work

you submit must be entirely your own. If you have a question in lab, feel free to reach out to

other groups or talk to me if you get stuck.

Overview: the pnorm() function

The pnorm() function computes probabilities from a normal distribution with specified mean and

standard deviation. The function inputs a value in the q argument and computes the probability

that a value drawn from a normal distribution will be less than or equal to q. The exact normal

distribution to compare with is specified using the mean and sd arguments. The default mean is

mean = 0 and the default standard deviation is sd = 1.

The optional argument lower.tail inputs a logical value (TRUE or FALSE) and changes the

direction of the probability. The default is lower.tail = TRUE, so the pnorm() function will, as

noted above, compute the probability that a value drawn from the normal distribution will be less

than or equal to q. If we set lower.tail = FALSE, then the pnorm() function will compute the

probability that a value drawn from the normal distribution will be greater than or equal to q.

For example, if we want the probability of observing a person with a height of 70 inches or less

from a population that follows a normal distribution with a mean height of 68 inches and a

standard deviation of 1.9 inches:

pnorm(70, mean = 68, sd = 1.9)

If we want the probability of observing a person with a height of 70 inches or more from a

population that follows a normal distribution with a mean height of 68 inches and a standard

deviation of 1.9 inches:

pnorm(70, mean = 68, sd = 1.9, lower.tail = FALSE)

Overview: The One-Sample z-Test for proportions

To do a (theory-based) hypothesis test (test of significance) for the proportions, we follow

several steps:

1. State the null and alternative hypotheses about the population proportion, p0.

2. From a sample of data, compute the sample proportion � and its standardized z-statistic z =

“# “%

&’ , where SE is assumed to be based on the null hypothesis and is calculated as “% ()# “%)

+

where n is the sample size.

3. Check the validity conditions to apply the Central Limit Theorem.

4. If the validity conditions hold, compute the p-value by comparing the observed z-statistic to a

standard normal distribution (using normal tables or pnorm()).

Exercise 1 – Hypothesis testing with one proportion.

We will be working with a modified Flint dataset which can be found on CCLE. Please

download this file and read it into R. You may recall in Lab 2 that lead levels were considered

dangerous if the result was greater than or equal to 15PPB. We are interested in determining if

the proportion of dangerous lead levels in Flint is greater than 10%. Assume the Flint data is a

random sample used to address this research question.

a. We will conduct a hypothesis test for this research question. Using symbols from lecture,

what are the null and alternative hypotheses? Is this a one-sided or a two-sided test?

b. Calculate the sample proportion and sample standard deviation of the sample proportion

of dangerous lead levels.

c. Now, calculate the SE of sample proportions, and the z-value for this test. Consult the

above instructions and/or the lecture slides for guidance.

d. Using the z-statistic in (c), calculate the p-value associated with this test. You may use

R’s pnorm() function or a normal table, but please show all work.

e. Using a significance level of 0.05, do you reject the null hypothesis?

f. If greater than 10% of households in Flint contain dangerous lead levels, the EPA

requires remediation action to be taken. Based on your results, what should you tell the

EPA?

g. Another way to run this test is to use the prop.test() function using the mosaic package.

You will need to know your sample size, and the number of “successes” in the sample.

Use this function to conduct the same hypothesis test in (a)-(d) and obtain a p-value from

the test. Using the same significance level of 0.05, do your results change? An example

of the prop.test() function is shown in the two lines below:

## We flipped 100 coins and 60 were heads. Is the long-run proportion of heads greater than 0.5?

prop.test(x = 60, n = 100, p = 0.5, alt = “greater”)

h. Notice that the prop.test() output produced a confidence interval. Try using the help

screen under the mosaic package for prop.test() to find the argument for the confidence

interval. Modify the argument and re-run the code in (f) to produce a 99% confidence

interval instead of a 95% interval.

Overview: the Two-Sample z-Test for proportions

The hypothesis test for two proportions follows a similar framework as the test for one

proportion, but uses a different statistic. The z-statistic is calculated as z = “,#”-#.

&’ and SE =

� 1 − �

)

+,

+ )

+-

where:

n1 is the sample size in sample 1

n2 is the sample size in sample 2

x1 is the number of successes in sample 1

x2 is the number of successes in sample 2

� = 2)324

+)3+4 is the combined (pooled) proportion of successes

�) = 2)

+) is the proportion of successes in sample 1

�4 = 24

+4

is the proportion of successes in sample 2

**OPTIONAL** Exercise 2 – Hypothesis testing with two proportions.

Notice that the Flint data contains a column called Region which is either “North” or “South”.

We are interested in determining if the proportion of dangerous lead levels in the North region

differs from the proportion of dangerous lead levels in the South region. Again assume the Flint

data from CCLE is a random sample used to address this research question.

a. We will conduct a hypothesis test for this research question. What are the null and

alternative hypotheses? Is this a one-sided or a two-sided test?

b. Using guidance from lecture, calculate every value you will need to produce a z-statistic

for this test. Then, calculate the z-statistic. Please show all work.

c. Using the z-statistic in (b), calculate the p-value associated with this test. You may use

R’s pnorm() function or a normal table, but please show all work.

d. Using a significance level of 0.05, do you reject the null hypothesis? Interpret this result

in the context of our research question. Hint: is this a two-sided test?

e. Another way to run this test is to use the prop.test() function with some slight

modifications from exercise 1. Use the function to conduct the same hypothesis test in

(a)-(d) and obtain a p-value from the test, again using a significance level of 0.05. Do

your results change? A sample of the prop.test() function for two proportions is shown in

the two lines below:

## We flipped 100 pennies and 52 were heads. Then we flipped 80 dimes, and 47 were heads. Is

## the long-run proportion of heads different for pennies and dimes?

prop.test(x = c(52,47), n = c(100,80), alt = “two.sided”)

Overview: the One-Sample t-Test for means

To do a (theory-based) hypothesis test (test of significance) for the mean of a quantitative

variable, we follow several steps:

1. State the null and alternative hypotheses about the mean parameter µ.

2. From a sample of data, compute the sample mean � and its standardized t-statistic t = 2# 6

7/ +

,

where s is the sample standard deviation and n is the sample size.

3. Check the validity conditions to apply the Central Limit Theorem.

4. If the validity conditions hold, compute the p-value by comparing the observed t-statistic to a

t-distribution with df = n − 1.

Overview: the pt() Function

The pt() function computes probabilities from a t-distribution with specified degrees of freedom.

The syntax for pt() is very similar to the pnorm() function. The pt() function inputs a value in the

q argument and computes the probability that a value drawn from a t-distribution will be less

than or equal to q. The exact t-distribution to compare with is specified using the df argument (df

stands for degrees of freedom). There is no default value for the df, so you must input the df

argument for the command to work.

The optional argument lower.tail inputs a logical value (TRUE or FALSE) and changes the

direction of the probability. The default is lower.tail = TRUE, so the pt() function will, as noted

above, compute the probability that a value drawn from the t-distribution will be less than or

equal to q. If we set lower.tail = FALSE, then the pt() function will compute the probability that

a value drawn from the t-distribution will be greater than or equal to q.

For example, if we want the probability of observing a value of –1.5 or less from statistics that

follow a t-distribution with 29 degrees of freedom:

pt(-1.5,df=29)

Exercise 3 – Hypothesis testing with means

Copper is another metal which can be dangerous in high quantities. We believe the average

copper levels in the state of Michigan’s drinking water is 40 PPM. We are interested in finding if

the copper level in Flint’s water differs from the Michigan average. Again, assume the Flint data

constitutes a random sample.

a. We will conduct a hypothesis test for this research question. What are the null and

alternative hypotheses? Is this a one-sided or a two-sided test?

b. Calculate the sample mean and sample standard deviation of the sample copper levels in

Flint.

c. Calculate the standard error of the sample mean for copper, �.

d. Using the values in (b) and (c), calculate the t-test statistic and p-value associated with

this test. Use pt() to obtain a test statistic. Hint: is this a two-sided test?

e. Using a significance level of 0.01, do you reject the null hypothesis? Interpret this result

in the context of our research question.

f. Another way to run this test is to use the t.test() function using the mosaic package. Use

this function to conduct the same hypothesis test in (a)-(d) and obtain a p-value from the

test, again using a significance level of 0.05. Do your results change? An example of the

prop.test() function is shown in the two lines below:

## Using sample Flint lead data, do we believe the long run Flint lead average is not equal to 3?

t.test(flint$Pb, mu = 3, alt = “two.sided”)

WhatsApp us