Homework 1 STA410H1F/2102H1F Catastrophic cancellation

$30.00

Category: Tags: , , , , You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (3 votes)

1. Catastrophic cancellation in the subtraction x y can occur when x and y are subject to
round-o↵ error. Specifically, if fl(x) = x(1 + u) and fl(y) = y(1 + v) then
fl(x) fl(y) = x y + (xu yv)
where the absolute error |xu yv| can be very large if both x and y are large; in some cases,
this error may swamp the object we are trying to compute, namely x y, particularly if
|x y| is relatively small compared to |x| and |y|. For example, if we compute the sample
variance using the right hand side of the identity
Xn
i=1
(xi x¯)
2 = Xn
i=1
x2
i 1
n
Xn
i=1
xi
!2
, (1)
a combination of round-o↵ errors from the summations and catastrophic cancellation in the
subtraction may result in the computation of a negative sample variance! (In older versions of
Microsoft Excel, certain statistical calculations were prone to this unpleasant phenomenon.)
In this problem, we will consider two algorithms for computing the sample variances that
avoid this catastrophic cancellation. Both are “one pass” algorithms, in the sense that we
only need to cycle once through the data (as is the case if we use the right hand side of (1));
to use the left hand side of (1), we need two passes since we need to first compute ¯x before
computing the sum on the left hand side of (1). In parts (a) and (b) below, define ¯xk be the
sample mean of x1, ··· , xk and note that
x¯k+1 = k
k + 1
x¯k +
1
k + 1
xk+1
with ¯x = ¯xn.
(a) Show that Xn
i=1
(xi x¯)
2 can be computed using the recursion
k
X
+1
i=1
(xi x¯k+1)
2 = X
k
i=1
(xi x¯k)
2 +
k
k + 1(xk+1 x¯k)
2
for k = 1, ··· , n 1. (This is known as West’s algorithm.)
(b) A somewhat simpler one-pass method replaces ¯x by some estimate x0 and then corrects
for the error in estimation. Specifically, if x0 is an arbitrary number, show that
Xn
i=1
(xi x¯)
2 = Xn
i=1
(xi x0)
2 n(x0 x¯)
2
.
(c) The key in using the formula in part (b) is to choose x0 to avoid catastrophic cancellation,
that is, x0 should be close to ¯x. How might you choose x0 (without first computing ¯x) to
minimize the possibility of catastrophic cancellation? Ideally, x0 should be calculated using
o(n) operations.
(An interesting paper on computational algorithms for computing the variance is “Algorithms
for computing the sample variance: analysis and recommendations” by Chan, Golub, and
LeVeque; this paper is available on Blackboard.)
2. Suppose that X1, X2, ··· is an infinite sequence of independent identically distributed
random variables with some distribution function F and N is a Poisson distribution with
mean that is independent of the sequence {Xi}. Then we can define a compound Poisson
random variable Y by
Y = X
N
i=1
Xi
where Y = 0 if N = 0. This distribution arises naturally in risk theory and insurance – for
example, if N represents the number of claims and X1, X2, ··· the amounts paid for each
claim then Y is the total sum paid. For the purposes of risk management, it is useful to
know the distribution of Y , particularly its tail.
(a) Suppose that {Xi} are discrete integer-valued random variables with probability generating function (s) = E(sXi ). Show that the probability generating function of Y is g((s))
where g(s) is the probability generating function of N, which is given by
g(s) = exp((1 s)).
(b) The distribution of Y can be approximated using the Fast Fourier Transform. Assume
that the random variables {Xi} have a distribution p(x) on the integers 0, 1, ··· , `. The
complication is that, unlike the distribution of S = X1 + ··· + Xn, the distribution of Y is
not concentrated on a finite set of integers. Therefore, we need to find an integer M such
that P(Y M) is smaller than some pre-determined threshold ✏; M will depend on the
Poisson mean as well as the integer ` (or more precisely, the distribution p(x)). (Also
to optimize the FFT algorithm, M should be a power of 2 although this isn’t absolutely
necessary unless M is very large.) Show that if P(N m)  ✏ then P(Y m`)  ✏ and so
we can take M m`.
(c) The bound M determined in part (b) is typically very conservative and can be decreased
substantially. One approach to determining a better bound is based on the probability
generating function of Y derived in part (a) and Markov’s inequality. Specifically, if s > 1
we have
P(Y M) = P(sY sM) 
E(sY )
sM = exp((1 (s)))
sM .
Use this fact to show that for P(Y M) < ✏, we can take
M = inf
s>1
ln(✏) (1 (s))
ln(s) .
(d) Given M (which depends on ✏), the algorithm for determining the distribution of Y goes
as follows:
1. Evaluate the DFT of {p(x) : x = 0, ··· , M 1}:
pb(j) =
M
X1
x=0
exp ✓
2⇡◆
j
M x

p(x)
where p(` + 1) = ··· = p(M 1) = 0.
2. Evaluate g(pb(j)) = exp((1 pb(j))) for j = 0, ··· , M 1.
3. Evaluate P(Y = y) by computing the inverse FFT:
P(Y = y) = 1
M
M
X1
j=0
exp ✓
2⇡◆
y
M j

g(pb(j))
Write an R function to implement this algorithm where M is determined using the method
in part (c) with ✏ = 105. Use this function to evaluate the distribution of Y in the case
where = 7 and the distribution of {Xi} is
p(x) = P(Xi = x) = 4 |3 x|
16
for x = 0, ··· , 6.
(You do not need to evaluate the bound M from part (c) with great precision; for example,
a simple approach is to take a discrete set of points S = {1 < s1 < s2 < ··· < sk} and define
M = min s2S
ln(✏) (1 (s))
ln(s)
where = si+1 si and sk are determined graphically (that is, by plotting the appropriate
function) so that you are convinced that the value of M is close to the actual infimum.
On Blackboard, there is a paper by Embrechts and Frei “Panjer recursion versus FFT for
compound distributions”, which (not surprisingly) compares the FFT approach to alternative approach (Panjer recursion). This paper includes some R code for the FFT algorithm
described above and discusses some modifications to the basic FFT algorithm – feel free to
use this R code as a template for your function. (Feel free to explore some of the other
aspects of this paper for your own edification.)
Supplemental problems (not to hand in):
3. An alternative to the Quicksort algorithm is Mergesort (see p. 122 of the text). The
Mergesort algorithm works by merging pairs of ordered lists. If the two lists contain r and
s elements, respectively, then the number of comparisons needed to merge the two lists into
single list lies between min(r, s) and r + s 1.
Define C(n) to be the (random) number of comparisons in the Mergesort algorithm. It can
be shown that the expected number of comparisons, E[C(n)] satisfies the recursive formula
E[C(n)] = E(Zn) + E[C(bn/2c)] + E[C(dn/2e)]
where bn/2c = dn/2e = n/2 if n is even with bn/2c = (n 1)/2, dn/2e = (n + 1)/2 if n is
odd, and Zn is a random variable whose distribution is
P(Zn = k) =
k 1
bn/2c 1
!
+
k 1
dn/2e 1
!
n
bn/2c
! for bn/2c  k  n 1
and whose expected value is
E(Zn) = bn/2cdn/2e
1
bn/2c + 1 +
1
dn/2e + 1!
(Zn represents the number of comparisons needed to merge two ordered lists of lengths bn/2c
and dn/2e, respectively.)
(a) In class, we derived the following recursive formula for E[C(n)] for the Quicksort algorithm:
E[C(n)] = n 1 +
2
n
Xn
k=1
E[C(k 1)]
where E[C(0)] = E[C(1)] = 0. Using R, compute E[C(n)] for n = 1, ··· , 1000 and compare
the average performance of Quicksort to that of Mergesort.
(b) Calculate the worst case behaviour of C(n) for Mergesort.
4. Problem 3.6 of the text. As the hint suggests, prove the result for n = pk and then show
that it holds for n between pk and pk+1. Use the result to show that the worst case behaviour
of C(n) for Mergesort is O(n ln(n)).