# ISyE 6412A Theoretical Statistics HW 3

\$30.00

## Description

5/5 - (1 vote)

Problem 1. In Problem 1 of HW#1, we assume that Y1, . . . , Yn are iid normal N(θ, 1), and we want to
guess θ when the loss function is given by L(θ, d) = (θ − d)
2/(1 + θ
2
).

One of the proposed procedures is:
δ2,n(Y1, . . . , Yn) = Y¯
n + n
−1
1 + n−1
.

The purpose of this exercise is to show that this procedure is actually Bayes. Show that the procedure δ2,n
is Bayes relative to the prior density πa(θ) = C1(1+θ
2
)ϕ(θ−1), where ϕ(x) = √
1

exp(−
x
2
2
) is the standard
normal N(0, 1) density and C1 is a suitable constant.

Your need not determine the C1; it suffices to verify
that the given function of θ have finite integrals, so that one knows such C1 exist.

Problem 2. Suppose Y1, Y2, . . . , Yn (n ≥ 2) are independent and identically distributed (iid) with a
Uniform[0, θ], and consider a Pareto prior distribution, θ ∼ P A(α, β), i.e., θ has a prior density
π(θ) = αβα
θ
α+1 , for θ ≥ β,
and mean E(θ) = αβ/(α − 1), for some known α > 1 and β > 0.
(a) Show that the posterior distribution of θ has a Parato P A(α

, β∗
) distribution with α
∗ = α + n and
β
∗ = max{β, Y(n)}, where Y(n) = max(Y1, . . . , Yn).
(b) Find the Bayes procedure δ

B(X) under the “squared error loss” function L(θ, d) = (θ − d)
2
.
(c) Find the Bayes procedure δ

C (X) under the “absolute error loss” function L(θ, d) = |θ − d|.

[Hints for (a): the domain of θ also includes Y1 ≤ θ, . . . , Yn ≤ θ, which is equivalent to θ ≥ Y(n)
.]

Problem 3. (LINEX Loss. Problem 7.65 on page 367 of our text). Suppose Ω can be labeled
according to the values of a real parameter θ, and the decisions are D = {d : −∞ < d < ∞}, representing
guesses as to the true value of θ. The loss function is the LINEX (LINear-EXponential) loss given by
L(θ, d) = e
c(d−θ) − c(d − θ) − 1,
where c is a positive constant.

The LINEX loss was investigated by Zellner (1986), and can handle asymmetries in a smooth way: as the constant c varies, the loss function varies from very asymmetric to almost
symmetric. We also assume that the prior density on Ω = {θ : −∞ < θ < ∞} is π(θ).

(a) For c = 0.2, 0.5, 1, plot L(θ, d) as a function of d − θ. Feel free to use any computer software.

(b) “No data problem.” Suppose we had to make a guess of θ based on no observations, but merely on the
given prior law π(θ). Show that you minimize your expected loss E(L(θ, d)) = R ∞
−∞ L(θ, d)π(θ)dθ by
guessing θ as δb = −
1
c
log R ∞
−∞ e
−cθπ(θ)dθ = −
1
c
log E(e
−cθ).

(c) (2 pts). “Problem with data.” Now suppose that we also observe data Y which has distribution function
F(y|θ). Show that the Bayes procedure is given by δc(Y) = −
1
c
log E(e
−cθ|Y).
(d) Let Y1, . . . , Yn be iid N(θ, σ2
), where σ
2
is known, and suppose that θ has the so-called “noninformative”
prior π(θ) ≡ 1. Show that the Bayes procedure with respect to LINEX loss is given by δd(Y¯ ) =
Y¯ − (cσ2/(2n)). See the remarks/hints at the end of this question.

(e) Under the assumption of part (d), calculate the posterior expected loss for both δd(Y¯ ) and δe(Y¯ ) = Y¯
using LINEX loss. Recall that for a given procedure δ(Y), the posterior expected loss is defined
as hπ(y, δ(y)) where hπ(y, d) = R ∞
−∞ L(θ, d)π(θ|y)dθ.

(f) Repeat part (e) using squared error loss, i.e., calculate the posterior expected loss for both δd(Y¯ ) and
Y¯ when the loss function is L(θ, d) = (θ − d)
2
.

[Remarks/Hints for (d): Rigorously speaking, π(θ) ≡ 1 is not a probability density on Ω = (−∞, ∞),
but for our purpose, here we pretend that it can be thought of as a valid prior distribution. To simplify
your computation, you can assume that you just observe “one new observation” Y¯
n ∼ N(θ, σ
2
n
) (this
simplification is reasonable because Y¯
n is a sufficient statistic which will be discussed later). For a data
Y , ¯ what is the marginal distribution m(¯y) = R ∞
−∞ fθ(¯y)π(θ)dθ when π(θ) ≡ 1? What is the posterior
distribution π(θ|y¯) = fθ(¯y)π(θ)/m(¯y)? Is it well-defined?]

Problem 4. Suppose that a random variable Y has a Gamma(q, 1/θ) distribution with density fθ(y) =
θ
qy
q−1
e
−θy
Γ(q)
, for y ≥ 0, where Γ(q) = R ∞
0
y
q−1
e
−ydy = (q − 1)Γ(q − 1) and q is positive and known.

Here
S = {y : y ≥ 0}, Ω = {θ : θ ≥ 0}, D = {d : d ≥ 0}. Under a Bayesian setting, suppose that the prior
distribution θ is Gamma(α, β), i.e., θ has a prior density π(θ) = θ
α−1
e
−θ/β
Γ(α)βα for θ ≥ 0.

(a) Show that the posterior distribution of θ has a Gamma(α

, β′
) distribution with α
′ = q+α and β
′ =
β
1+βy .

This implies that π(θ|y) = π(θ)fθ(y)
R ∞
0
π(θ)fθ(y)dθ can be written as π(θ|y) = θ
α′−1
e
−θ/β′
Γ(α′)(β′)α′ .

(b) (Point Estimation) Find the Bayes procedure δbayes(Y ) under the loss function L(θ, d) = (θ − d)
2
.
(c) (Confidence Interval) In the fixed-width confidence interval estimation setting, suppose one wants to
estimate θ by the interval [d − 2, d + 2] for some decision d ≥ 2. This corresponds to estimate θ under
a new binary loss function L

(θ, d) = 
1 if |θ − d| > 2;
0 if |θ − d| < 2.

That is, the loss is the same for all “bad”
decisions (those which mis-estimate θ by more than 2 units) and is 0 for all “correct” decisions. Show
that the Bayes procedure is given by
δ

(Y ) = 
2, if q + α ≤ 1;
the solution d of π(d + 2|Y ) = π(d − 2|Y ), if q + α > 1.
=

2, if q + α ≤ 1;
−2 + 4/[1 − exp(−
4(Y +1/β)
α+q−1
)], if q + α > 1.

In other words, the Bayes confidence interval with fixed-width of 4 is [δ

(Y ) − 2, δ∗
(Y ) + 2].

Problem 5 (Bayes with constant risk). Assume that Y1, . . . , Yn are iid Bernoulli(θ) with 0 ≤ θ ≤ 1, i.e.,
Pθ(Yi = y) = θ
y
(1 − θ)
1−y
for y = 0 or 1, and suppose that we want to estimate θ under the square error
loss function L(θ, d) = (θ − d)
2
. We have shown in class when θ has a prior Beta(α, β) distribution, i.e.,
π(θ) = Γ(α + β)
Γ(α)Γ(β)
θ
α−1
(1 − θ)
β−1
, 0 ≤ θ ≤ 1
for two pre-specified constants α > 0, β > 0, the Bayes procedure is given by
δα,β,n(Y1, . . . , Yn) = α +
Pn
i=1 Yi
α + β + n
=
α + β
α + β + n
×
α
α + β
+
n
α + β + n

n,
which is the weighted average of the prior mean α/(α + β) and the sample mean Y¯
n = (Y1 + . . . + Yn)/n.

(a) Compute the risk function of the Bayes procedure δα,β,n(Y1, . . . , Yn).

(b) An interesting scenario is to choose a specific pair of α
∗ > 0 and β
∗ > 0 in such a way that the
corresponding Bayes procedure has constant risks when α = α
∗ and β = β

, i.e., the risk function in
part (a) does not depend on θ. Find such a pair (α

, β∗
). [A check: α
∗ =

n/2.]

(c) Calculate the Bayes risk of the Bayes procedure δα∗,β∗,n in part (b) when θ has a prior distribution
π(θ) = Beta(α

, β∗
).

(d) Compute the risk function of the sample mean δ0 = δ0(Y1, . . . , Yn) = Y¯
n, and calculate the Bayes risk
of δ0 when θ has a prior distribution π(θ) = Beta(α

, β∗
).

(e) Show that the Bayes procedure δα∗,β∗,n indeed has a smaller Bayes risk than the sample mean δ0 = Y¯
n
when θ has a prior distribution π(θ) = Beta(α

, β∗
), where α
∗ and β
∗ are determined in part (b).

Hint to Problem 1: You can show that a Bayes procedure shall minimize
h

π
(y, d) = Z ∞
−∞
L(θ, d)fθ(y1, · · · , yn)πa(θ)dθ = C1
Z ∞
−∞
(θ − d)
2
fθ(y1, · · · , yn)ϕ(θ − 1)dθ,
where the integral is just the Bayes risk when the loss function function is squared error loss and θ has a
prior distribution N(µ = 1, τ 2 = 1). Which decision d minimizes this integral? Did we find such a minimizer
d in class?

Hints to Problem 4(c): Note that a Gamma(α, β) random variable has mean αβ, see page 624 of our
textbook for more properties of Gamma distribution. In part (c), we want to choose d to minimize
Z ∞
0
L

(θ, d)π(θ|y)dθ = 1 −
Z d+2
d−2
π(θ|y)dθ. (1)

Note that in this problem, π(θ|y) is continuous and unimodal; that is, there is a mode (possibly 0) such that
π(θ|y) increases for θ < mode and decreases for θ > mode. Then it is an easy calculus exercise to show that
relation (1) is minimized by that value d

for which π(d
∗ − 2|y) = π(d
∗ + 2|y) if π(0|y) < π(4|y), and by
d
∗ = 2 otherwise (please feel free to use this fact if you cannot prove it or if you simply do not want to spend
time to prove it! No penalty here!)
In addition, note that π(0|y) = ∞ if α
′ = q + α < 1 and π(0|y) = 0 if α
′ > 1. Hence, π(0|y) ≥ π(4|y) if
and only if q + α ≤ 1. What happens if q + α = 1?