Description
1. In this problem, you will use the discrete cosine transform (DCT) to denoise an image.
An R function to compute a two-dimensional DCT is available in the R package dtt – this
package contains functions to compute a number of “trigonometric” transforms. The R
function that will be used in this package is mvdct.
(a) In lecture, we defined matrix transforms for vectors of length n, focusing on families of
matrices {An} satisfying AT
nAn = Dn where Dn is a diagonal matrix.
Suppose that Z is an m × n pixel image, which we can represent as an m × n matrix. Then
using {An}, we can define a transform of Z as follows:
Zb = AmZAT
n
Given the m × n matrix Zb, show that we can reconstruct the original image by Z =
D−1
m AT
mZAb
nD−1
n
.
(b) To denoise an image using the DCT, we first compute Zb and then perform hard- or
soft-thresholding to obtain a thresholded (or shrunken) transform Zb∗
(which will typically
be “sparser” than Zb). Our hope is that the components of Zb corresponding to noise in the
image are eliminated and so the denoised image can be obtained by applying the inverse
transform to Zb∗
.
Using mvdct, write R functions to perform both hard- and soft-thresholding (dependent
on a parameter λ). Your functions should not threshold the (1, 1) component of the DCT
matrix. (A simple R function to do hard thresholding will be provided on Quercus; this can
be modified to do soft thresholding.)
(c) The file boats.txt contains a noisy 256 × 256 pixel grayscale image of sailboats. Its
entries are numbers between 0 and 1 where 0 represents black and 1 represents white. The
data can be read into R and displayed as an image as follows:
> boats <- matrix(scan(“boats.txt”),ncol=256,byrow=T) > image(boats, axes=F, col=grey(seq(0,1,length=256)))
Using your functions from part (b), try to denoise the image as best as possible. (This is
quite subjective but try different methods and parameter values.)
Note: You should not expect your noise-reduced image to be a drastic improvement over
noisy image; in fact, connaisseurs of pointillism may find prefer the noisy image. There
are two issues here: First, we are applying noise reduction to the whole image rather than
dividing the image into smaller sub-images and applying noise reduction to each sub-image.
Second, even very good noise reduction tends to wash out some details, thereby rendering
the noise-reduced image less visually appealing.
2. Suppose that U and V are independent Poisson random variables with means λu and λv.
We then define X = U + 2V , which is said to have a Hermite distribution with parameters
λu and λv. (The Hermite distribution is the distribution of a sum of two correlated Poisson
random variables.)
(a) Show that the probability generating function of X is
g(s) = E(s
X) = exp h
λu(s − 1) + λv(s
2 − 1)i
.
(b) The distribution of X can, in theory, be obtained exactly in closed-form with exact
computation for given λu and λv somewhat more difficult. However, we can approximate
the distribution very well by combining the exact probability generating with the discrete
Fourier transform. The key to doing this is to find M such that P(X ≥ M) is very small so
that we can use the discrete Fourier transform to compute (to a very good approximation)
P(X = x) for x = 0, 1, · · · , M − 1.
One approach to determining M is to use the probability generating function of X and
Markov’s inequality. Specifically, if s > 1 we have
P(X ≥ M) = P(s
X ≥ s
M) ≤
E(s
X)
sM
=
exp [λu(s − 1) + λv(s
2 − 1)]
sm
Use this fact to show that for P(X ≥ M) ≤ , we can take
M = inf
s>1
λu(s − 1) + λv(s
2 − 1) − ln()
ln(s)
(c) Given M (which depends on ), the algorithm for determining the distribution of X goes
as follows:
1. Evaluate the probability generating function g(s) at s = sk = exp(−2πιk/M) for
k = 0, · · · , M − 1; the values of s can be created in R as follows:
> s <- exp(-2*pi*1i*c(0:(M-1))/M)
2. Evaluate P(X = x) by computing the inverse FFT of the sequence {g(sk) : k =
0, · · · , M − 1}:
P(X = x) = 1
M
M
X−1
k=0
exp
2πι
x
M
k
g(sk)
Write an R function to implement this algorithm where M is determined using the method in
part (b) with = 10−5
. Use this function to evaluate the distribution of X for the following
two cases:
(i) λu = 1 and λv = 5;
(ii) λu = 0.1 and λv = 2.
Note that you do not need to evaluate the bound M with great precision; for example, a
simple approach is to take a discrete set of points S = {1 < s1 < s2 < · · · < sk} and define
M = min
s∈S
λu(s − 1) + λv(s
2 − 1) − ln()
ln(s)
where δ = si+1 − si and sk are determined graphically (that is, by plotting the appropriate
function) so that you are convinced that the value of M is close to the actual infimum.
Supplemental problems (not to hand in):
3. As noted in lecture, catastrophic cancellation in the subtraction x − y can occur when x
and y are subject to round-off error. Specifically, if fl(x) = x(1 +u) and fl(y) = y(1 +v) then
fl(x) − fl(y) = x − y + (xu − yv)
where the absolute error |xu − yv| can be very large if both x and y are large; in some cases,
this error may swamp the object we are trying to compute, namely x − y, particularly if
|x − y| is relatively small compared to |x| and |y|. For example, if we compute the sample
variance using the right hand side of the identity
Xn
i=1
(xi − x¯)
2 =
Xn
i=1
x
2
i −
1
n
Xn
i=1
xi
!2
, (1)
a combination of round-off errors from the summations and catastrophic cancellation in the
subtraction may result in the computation of a negative sample variance! (In older versions of
Microsoft Excel, certain statistical calculations were prone to this unpleasant phenomenon.)
In this problem, we will consider two algorithms for computing the sample variances that
avoid this catastrophic cancellation. Both are “one pass” algorithms, in the sense that we
only need to cycle once through the data (as is the case if we use the right hand side of (1));
to use the left hand side of (1), we need two passes since we need to first compute ¯x before
computing the sum on the left hand side of (1). In parts (a) and (b) below, define ¯xk be the
sample mean of x1, · · · , xk and note that
x¯k+1 =
k
k + 1
x¯k +
1
k + 1
xk+1
with ¯x = ¯xn.
(a) Show that Xn
i=1
(xi − x¯)
2
can be computed using the recursion
k
X
+1
i=1
(xi − x¯k+1)
2 =
X
k
i=1
(xi − x¯k)
2 +
k
k + 1
(xk+1 − x¯k)
2
for k = 1, · · · , n − 1. (This is known as West’s algorithm.)
(b) A somewhat simpler one-pass method replaces ¯x by some estimate x0 and then corrects
for the error in estimation. Specifically, if x0 is an arbitrary number, show that
Xn
i=1
(xi − x¯)
2 =
Xn
i=1
(xi − x0)
2 − n(x0 − x¯)
2
.
(c) The key in using the formula in part (b) is to choose x0 to avoid catastrophic cancellation,
that is, x0 should be close to ¯x. How might you choose x0 (without first computing ¯x) to
minimize the possibility of catastrophic cancellation? Ideally, x0 should be calculated using
o(n) operations.
(An interesting paper on computational algorithms for computing the variance is “Algorithms
for computing the sample variance: analysis and recommendations” by Chan, Golub, and
LeVeque; this paper is available on Quercus.)
4. (a) Suppose that A, B, C, and D are matrices so that AC and BD are both well-defined.
Show that
(AC) ⊗ (BD) = (A ⊗ B)(C ⊗ D)
(Hint: This is easier than it looks — the key is to start with the right hand side of the
identity.)
(b) Use the result of part (a) to show that
(A ⊗ B)
−1 = A
−1 ⊗ B
−1
for invertible matrices A and B.
(c) Suppose that H2
k is a 2k × 2
k Hadamard matrix. Prove the claim given in lecture:
H2
k =
Y
k
j=1
(I2
j−1 ⊗ H2 ⊗ I2
k−j )
(Hint: Use induction, starting with the fact that the identity holds trivially for k = 1.)