Description
1. Cross-validating a Bayesian Regression.
In this exercise covariates x1 and x2 are
simulated as1
x1 = rand(1, 40); x2 = floor(10 * rand(1,40)) + 1;
and the response variable y is obtained as
y = 2 + 6 * x1 – 0.5 * x2 + 0.8*randn(size(x1));
Write a WinBUGS program that takes 20 triples (x1, x2, y) to train the linear regression
model ˆy = b0 + b1x1 + b2x2 and then uses the remaining 20 triples to evaluate the model
by comparing the original responses yi
, i = 21, . . . , 40, with regression-predicted values
yˆi
, i = 21, . . . , 40. The comparison would involve calculating the MSE, the mean of (yi −
yˆi)
2
, i = 21, . . . , 40.
This is an example of how a cross-validation methodology is often employed to assess
statistical models.
How do the Bayesian estimators of β0, β1, β2, and σ compare to the “true” values 2, 6,
−0.5, and 0.8?
2. Body Fat from Linear Regression.
Excess adiposity is a risk factor for a range
of diseases, leading to increased morbidity and mortality. Body fat (BF) can be measured
1Or in Python or R:
import numpy as np
x1 = np.random.uniform(0,1, 40)
x2 = np.floor(10 * np.random.uniform(0,1,40)) + 1
y = 2 + 6 * x1 – 0.5 * x2 + 0.8*np.random.normal(0,1,len(x1))
====
x1 <- runif(40)
x2 <- floor(10 * runif(40)) + 1
y <- 2 + 6 * x1 – 0.5 * x2 + 0.8*rnorm(length(x1))
by several techniques such as skin-fold measurements bioelectrical impedance analysis (BIA)
and dual-energy X-ray absorptiometry (DEXA). Most of these techniques are not used in
the clinical practice or they are not adequate when large populations are considered.
Fuster-Para et al. (2015)2
compare several linear models for predicting the body fat (BF)
from Age, Body Mass Index (BMI), Body Adiposity Index (BAI) and Gender.
Data set RegBF.csv|xlsx provides data on Age (in years), Body Adiposity Index (BAI),
Body Mass Index (BMI), Body Fat (BF), and Gender (0 for males and 1 for females), of 3,200
adults from Mallorca (Spain). To save you some time a starter file BFReg.odc is provided.
Percentage of body fat mass was obtained by Tetrapolar Bioelectrical Impedance Analysis
(BIA) system (BF-350, Tanita Corp, Tokyo, Japan). The BAI is defined as
hip circumference in cm
(height in m)1.5 − 18.
We are interested in predicting BF from Age, BAI, BMI, Gender, and BB. BB is a new
variable defined as BB = BAI * BMI, and as such, describes the interaction between BAI and
BMI.
(a) Suggest two models: first with all predictors, and the second with single best predictor.
Explain how did you choose the best predictor.
(b) A new person is to be evaluated using the two models from (a). The covariates are:
Age = 35, BAI=26, BMI=20, Gender = 0, BB=520. What are the predicted BF’s from the
two models.
3. Shocks.
An experiment was conducted (Peter Lee, 2009; Dalziel et al., 1941) to assess
the effect of small electrical currents on farm animals, with the eventual goal of understanding
the effects of high-voltage power lines on livestock.
The experiment was carried out with
seven cows using six shock intensities, 0, 1, 2, 3, 4, and 5 milliamps (shocks on the order
of 15 milliamps are painful for many humans). Each cow was given 30 shocks, 5 at each
intensity, in random order.
The entire experiment was then repeated, so each cow received
a total of 60 shocks. For each shock the response, mouth movement, was either present or
absent. The data as quoted give the total number of responses, out of 70 trials, at each
shock level. We ignore cow differences and differences between blocks (experiments).
2Fuster-Parra, P., Bennasar-Veny, M., Tauler, P., Ya˜nez, A., L´opez-Gonz´alez, A. A., and Antoni Aguil´o,
A. (2015). A comparison between multiple regression models and CUN-BAE equation to predict body fat
in adults. PLOS One, DOI:10.1371/journal.pone.0122291.
2
Current Number of Number of Proportion of
(milliamps) x responses y trials n responses p
0 0 70 0.000
1 9 70 0.129
2 21 70 0.300
3 47 70 0.671
4 60 70 0.857
5 63 70 0.900
Using logistic regression and noninformative priors on its parameters, estimate the proportion of responses after a shock of 2.5 milliamps. Find 95% credible set for the population
proportion.
3