Description
Problem 1 (Confidence Interval). In a general statistical decision problem, the risk function of a statistical procedure often depends on the unknown parameter θ. However, in some special cases, the risk functions
of some families of the procedures may not depend on θ, and when this occurs, it will be straightforward
to derive the optimal procedure within these specific families of the procedures, as shown in the following
confidence interval estimation problem.
Assume that the random variables Y1, Y2, · · · , Yn are independent and identically distributed (i.i.d.) with
N(θ, σ2
), and σ known. In the interval estimation problem, the decision space for estimating θ is the forms
of d = [L, U] with L = L(Y1, · · · , Yn) and U = U(Y1, · · · , Yn), and a widely used loss function is
L(θ, d) = r ∗ length(d) − I(θ ∈ d) for some constant r > 0, where I(θ ∈ d) = 1 if θ ∈ d = [L, U] and 0
otherwise.
That is, the loss function includes two quantities: one is the length of the interval, and the other
is whether the interval correctly includes the true θ. Here, we focus on the specific family of the interval
estimator of the form
δc(Y) = [Y¯
n − cσ, Y¯
n + cσ], where Y¯
n = (Y1 + · · · + Yn)/n.
(a) For each c ≥ 0, show that the risk function of δc(Y) is given by
Rδc
(θ) = r(2cσ) − P(−c
√
n ≤ Z ≤ c
√
n) = 2crσ − 2Φ(c
√
n) + 1,
where Z ∼ N(0, 1) and Φ(z) = P(Z ≤ z).
(b) Show that the derivative of the risk function in (a) with respect to c is
d
dcRδc
(θ) = 2rσ −
2
√
n
√
2π
e
−nc2/2
,
which is an increasing function of c for c ≥ 0.
(c) Show that if rσ > √
n/√
2π, the derivative is positive for all c ≥ 0 and hence Rδc
(θ) is minimized at
c = 0. That is, the best interval estimator is the point estimator δ0(Y) = [Y¯
n, Y¯
n].
(d) When rσ ≤
√
n/√
2π, find the optimal copt that minimizes the risk function in (a).
(e) Find the specific r
∗ value so that the usual 1 − α confidence interval, [Y¯
n − zα/2σ/√
n, Y¯
n + zα/2σ/√
n],
minimizes the risk function in (a) among all procedures of the form δc(Y).
Problem 2 (Hypothesis Testing). A coin which has probability 1/3 or probability 1/2 of coming up
heads (no other values are possible) is flipped once. Y is the number of heads obtained on that flip, i.e.,
Y will take one of two possible values: 0 or 1.
The decision space D = {d0, d1}, where di
is the decision
“my guess is that the coin has probability 1/(2 + i) of coming up heads.” The loss is 1 for an incorrect
decision, 0 for a correct decision. [This is a 2decision, 2state setting, often referred as “hypothesis testing”
in statistics.]
(a) Specify S, Ω, D, and L (i.e., the sample space, the set of all possible distribution functions, the decision
space, and the loss function).
(b) There are four possible (nonrandomized) procedures:
δ1(0) = δ1(1) = d0; δ2(0) = δ2(1) = d1;
δ3(0) = d1, δ3(1) = d0; δ4(0) = d0, δ4(1) = d1.
Show that for a given procedure δ, the risk function Rδ(θ) = Pθ(δ reaches wrong decision) and use
this to find the risk function of each procedure. (There are only two possible values for θ : 1/3 or 1/2,
and so Rδ can be thought of as a 2vector for each δ.)
(c) Use the results of (b) to determine which of the four procedures among the nonrandomized procedures
is (or are)
(i) admissible;
(ii) Bayes with respect to a prior distribution Pπ(1/3) = 0.10 = 1 − Pπ(1/2);
(iii) Bayes with respect to a prior distribution Pπ(1/3) = 3/5 = 1 − Pπ(1/2).
Note that there are only 4 procedures and only 2 values of θ when considering admissible or Bayes
procedures.
(d) Is the procedure which was Bayes in (c)(ii) also Bayes relative to any other prior distributions (or laws)?
If so, which?
Problem 3 (Hypothesis Testing). Testing between simple hypotheses: Suppose the sample space S is
discrete; Ω consists of two possible probability functions of Y, say f0(y) and f1(y); the decision space D
consists of two elements d0, and d1; and the loss function
L(fi
, dj ) =
wi
, if i ̸= j,
0, if i = j,
where w0 and w1 are given positive numbers.
Show that, for any prior distribution π for which 0 ≤ π(fi) ≤ 1,
all (nonranomized) Bayes procedures are of the form
δ(y) =
d1,
d1 or d0,
d0,
according as f1(y)
f0(y)
>
=
<
C,
where C is a constant (perhaps 0 or ∞) depending on the wi
’s, π(f0) and π(f1), but not on y.
[Remark: In the statistical literature, the corresponding Bayes procedure is often called as the likelihood
ratio test, which is optimal in the frequentist setup in NeymanPearson lemma of minimizing Type II error
probability subject to the Type I error probability constraint.]
Problem 4 (Point Estimation). Suppose Ω can be defined by the density functions fθ(·) according to the
values of a real parameter θ, where a ≤ θ ≤ b. The decisions are D = {d : a ≤ d ≤ b}, representing guesses
as to the true value of θ. The loss function is L(θ, d) = θ − d
r
, where r is a given positive value. The prior
density on Ω is π(θ). Assume all π(θ)’s or fθ’s are positive throughout the sample space S.
(a) Show that if π(θy) is the posterior density function of θ given that the observed data Y = y ∈ S, then
a Bayes procedure is obtained by choosing δ(y) = d
′
to minimize R b
a
θ − d
′

rπ(θy)dθ.
[It is OK to simply quote our inclass discussions of how to compute Bayes procedures. In any event,
do not try to find a formula for the minimizing d
′
in this part (a).]
(b) In particular, for “squared error loss” (r = 2), show that from (a) that a Bayes procedure is δ(y) =
mean of posterior law π(θy) of θ.
(c) For r = 1 (“absolute error loss”), show that a Bayes procedure is obtained as any median (not necessarily
unique!) of the posterior law of θ. Since the crucial result from probability theory used in demonstrating
this may be unfamiliar, part of this problem is to prove it:
If g is a univariate probability density function with finite first moment, R ∞
−∞ θ−cg(θ)dθ is minimized
if and only if c is a median of g.
[Remark: If you want, the hints for part (c) can be found on the last page of this homework.]
[Hints for Problem 4 (c): It suffices to show that if m is a median and c is not a median, then
Z ∞
−∞
θ − cg(θ)dθ −
Z ∞
−∞
θ − mg(θ)dθ =
Z ∞
−∞
θ − c − θ − m
g(θ)dθ ≥ 0.
To prove this, assume for a moment that c > m, show (draw it!) that
θ − c − θ − m
− (c − m)sign(m − θ) ≥ 0,
where sign(u) = 1 if u > 0; = 0 if u = 0; and = −1 if u < 0. Moreover, the “≥ 0” is “> 0” if m < θ < c. The
result can be proved by combining this and the fact that
Z ∞
−∞
sign(m − θ)g(θ)dθ =
Z m
−∞
g(θ)dθ −
Z ∞
m
g(θ)dθ =
1
2
−
1
2
= 0,
since m is median for (an absolutely continuous probability distribution with) probability density function
g and thus R m
−∞ g(θ)dθ = 1/2.
Can you use the similar ideas to prove the case of c < m? ]