Stats 506, Problem Set 3 solved

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (6 votes)

Question 1 [60 points]
First, repeat question 3 parts a-c from problem set 1 using data.table for all computations and data
manipulations.
Problem Set 3 11/1/18, 10’42 AM
https://jbhender.github.io/Stats506/F18/PS3.html Page 2 of 3
Then, formulate and state a question answerable using the RECS data. Your question should be similar in
scope to (one of) parts a-c above and should rely on one or more variables not previously used. Answer your
question (using data.table) and provide supporting evidence in the form of nicely formatted graphs and/or
tables.
Question 2 [25 points]
In this question you will design a Monte Carlo study in R to compare the performance of different methods
that adjust for multiple comparisons (https://xkcd.com/882/). You can read more about each of these methods
by referring to help(p.adjust) in R and the references listed there.
Throughout this question, let , and
Let with and where is a diagonal matrix.
a. Write a function that accepts matrices X and beta and returns a p by mc_rep matrix of p-values
corresponding to p-values for the hypothesis tests:
In addition to X and beta your function should have arguments sigma (
) and mc_rep controlling the error variance of
and number of Monte Carlo replicates, respectively. Your function should solve the least-squares
problems using the QR decomposition of
. This decomposition should only be computed once each time your function is called.
i. Refer to the course notes to find .
ii. Use and to estimate the error variance for each Monte Carlo trial :
iii. Use the result from ii and the QR decomposition to find the variance of , .
[Note: you will need to do some algebra to determine how to compute using Q and R.
Or you can use the function chol2inv() .]
iv. Form and find
Test your function with a specific and by comparing to the output from appropriate methods applied to
the object returned by lm(Y ~ 0 + X) . It’s okay if there is some finite precision error less than ~1e-3 in
magnitude. Hint: use set.seed() to generate the same inside and outside the scope of the function for
the purpose of testing.
n= 1000 p = 100
= { βi
1 i ∈ {1, … , 10},
0 else.
X ∈ ℝn×p X ∼ N(0p , Σ) Y ∼ N(Xβ, σ ) 2Ip I p × p
H : = 0, : ≠ 0. 0 βi H1 βi
σ
Y
X X

β̂
Y Ŷ= Xβ̂ m
σ̂ = ( − 2
m
1
n− p ∑
i= 1
n
Yim Ŷ
im)
2
β̂
i vi = σ̂( X 2 X′ )−1
(X X
′ )−1
Z = / i β̂
i vi p = 2(1 − Φ (| |)). −1 Zi
X Y
Y
Problem Set 3 11/1/18, 10’42 AM
https://jbhender.github.io/Stats506/F18/PS3.html Page 3 of 3
b. Choose and as you like. Use the Cholesky factorization of to generate a single . Pass , ,
and to your function from the previous part.
c. Write a function evaluate that takes a set of indices where and returns Monte Carlo estimates
for the following quantities:
The family wise error rate
The false discovery rate
The sensitivity
The specificity
See this page (https://en.wikipedia.org/wiki/Sensitivity_and_specificity#Sensitivity_index) for additional details.
d. Apply your function from the previous part to the matrix of uncorrected P-values generated in part B.
Use the function p.adjust() to correct these p-values for multiple comparisons using ‘Bonferroni’,
‘Holm’, ‘BH’ (Benjamini-Hochberg), and ‘BY’. Use your evaluate() function for each set of adjusted
p-values.
e. Produce one or more nicely formatted graphs or tables reporting your results. Briefly discuss what you
found.
Question 3 (Optional) [25 points, 5 points each]
This is a bonus question related to problem 6 from the midterm. First, review the script written in Stata
available here (https://github.com/jbhender/Stats506_F18/tree/master/solutions/PS3). In this question, you will
work through various options for translating this analysis into R. You may submit all or some of these, but
each part must be entirely correct to earn the points listed.
a. Write a translation using data.table for the computations. [5 pts]
b. Write a function to compute the univariate regression coefficient by group for an arbitrary dependent,
independent, and grouping variables. Use data.table for computations within your function. Test
your function by showing it produces the same results as in part a. [10 pts]
c. Compute the regression coefficients using the dplyr verb summarize_at() . [5 pts]
d. Write a function similar to the one in part b to compute arbitrary univariate regression coefficients by
group. Use dplyr for computations within your function. You should read the “Programming with
dplyr” vignette at vignette(‘programming’, ‘dplyr’) before attempting this. Warning: this may be
difficult to debug! [10 points]
Σ σ Σ X X β
σ
β ≠ 0