Description
1. The table below reports results from a developmental toxicity study involving ordinal
categorical outcomes. This study administered diethylene glycol dimethyl ether (an industrial solvent used in the manufacture of protective coatings) to pregnant mice. Each
mouse was exposed to one of five concentration levels for ten days early in the pregnancy
(with concentration 0 corresponding to controls). Two days later, the uterine contents of
the pregnant mice were examined for defects. One of three (ordered) outcomes (“Dead”,
“Malformation”, “Normal”) was recorded for each fetus.
Concentration Response Total number
(mg/kg per day) Dead Malformation Normal of subjects
(xi) (yi1) (yi2) (yi3) (mi)
0 15 1 281 297
62.5 17 0 225 242
125 22 7 283 312
250 38 59 202 299
500 144 132 9 285
Build a multinomial regression model for these data using continuation-ratio logits for the
response probabilities πj (x), j = 1, 2, 3, as a function of concentration level, x. Specifically,
consider the following model
L
(cr)
1 = log
π1
π2 + π3
= α1 + β1x; L
(cr)
2 = log
π2
π3
= α2 + β2x
for the multinomial response probabilities πj ≡ πj (x), j = 1, 2, 3.
(a) Show that the model, involving the multinomial likelihood for the data = {(yi1, yi2, yi3, xi) :
i = 1, …, 5}, can be fitted by fitting separately two Binomial GLMs. Provide details for
your argument, including the specific form of the Binomial GLMs.
(b) Use the result from part (a) to obtain the MLE estimates and corresponding standard errors for parameters (α1, α2, β1, β2). Plot the estimated response curves ˆπj (x), for
j = 1, 2, 3, and discuss the results.
(c) Develop and implement a Bayesian version of the model above. Discuss your prior
choice, and provide details for the posterior simulation method. Provide point and interval
estimates for the response curves πj (x), for j = 1, 2, 3.
2. Consider the “alligator food choice” data example, the full version of which is discussed
in Section 7.1 of Agresti (2002), Categorical Data Analysis, Second Edition. Here, consider the subset of the data reported in Table 7.16 (page 304) of the above book. This
data set involves observations on the primary food choice for n = 63 alligators caught in
Lake George, Florida. The nominal response variable is the primary food type (in volume) found in each alligator’s stomach, with three categories: “fish”, “invertebrate”, and
“other”. The invertebrates were mainly apple snails, aquatic insects, and crayfish. The
“other” category included amphibian, mammal, bird, reptile, and plant material. Also
available for each alligator is covariate information on its length (in meters) and gender.
(a) Focus first on length as the single covariate to explain the response probabilities
for the “fish”, “invertebrate” and “other” food choice categories. Develop a Bayesian
multinomial regression model, using the baseline-category logits formulation with “fish”
as the baseline category, to estimate (with point and interval estimates) the response
probabilities as a function of length. (Note that in this data example, we have mi = 1,
for i = 1, …, n.) Discuss your prior choice and approach to MCMC posterior simulation.
(b) Extend the model from part (a) to describe the effects of both length and gender
on food choice. Based on your proposed model, provide point and interval estimates for
the length-dependent response probabilities for male and female alligators.
3. Consider the inverse Gaussian distribution with density function
f(y | µ, φ) = (2πφy3
)
−1/2
exp
−
(y − µ)
2
2φµ2y
, y > 0; µ > 0, φ > 0.
Denote the inverse Gaussian distribution with parameters µ and φ by IG(µ, φ).
(a) Show that the inverse Gaussian distribution is a member of the exponential dispersion
family. Show that µ is the mean of the distribution and obtain the variance function.
(b) Consider a GLM with random component defined by the inverse Gaussian distribution. That is, assume that yi are realizations of independent random variables Yi with
IG(µi
, φ) distributions, for i = 1,…,n. Here, g(µi) = x
T
i β, where β = (β1, …, βp) (p < n)
is the vector of regression coefficients, and xi = (xi1, …, xip)
T
is the covariate vector for
the ith response, i = 1,…,n. Define the full model so that the yi are realizations of independent IG(µi
, φ) distributed random variables Yi
, with a distinct µi
for each yi
. Obtain
the scaled deviance for the comparison of the full model with the inverse Gaussian GLM.