MTHE 474/874 – Homework # 3 solved

$35.00

Category: Tags: , , , You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (5 votes)

(1) For each D-ary variable-length code C below, determine (with justification) whether or
nor it is uniquely decodable.
(a) D = 2, C = {10, 010, 101}.
(b) D = 2, C = {0, 01, 011, 0111}.
(c) D = 3, C = {21, 20, 201, 202, 212}.
(d) D = 3, C = {1, 21, 221, 002, 021, 001}.
(e) D = 4, C = {10, 12, 13, 22, 121, 133, 220, 221, 223}.
(2) Consider the following set of integers: {l1 = 1, l2 = 1, l3 = 2, l4 = 2, l5 = 3, l6 = 3, l7 = 4}.
(a) Find the smallest integer D ≥ 2 such integers l1, . . . , l7 satisfy Kraft’s inequality in
base D.
(b) For the value of D found in part (a), find integer l

7
so that l1, . . . , l6, l∗
7
satisfy Kraft’s
inequality (in base D) with equality.
(c) For the value of D found in part (a), design a D-ary prefix code C with the integers
l1, . . . , l6, l∗
7
as its codeword lengths.
(d) Propose a discrete memoryless source for which the above code C is absolutely
optimal (i.e., its average codeword length equals the source’s entropy) and compute
the source’s entropy specifying its units.
(3) Determine (with justification) whether each of the following statements is True or False:
(a) There does not exist a binary prefix code of size three and with codeword lengths
l1 = l2 = 2 and l3 = 3 for which the following sequence
00101100010010110
is a concatenation of its codewords (and is hence uniquely decodable).
(b) For a uniformly distributed memoryless source with alphabet X = {a1, a2, . . . , a9},
it is not possible to design a ternary first-order Huffman code C with length variance
Var(l(C)) = 0.
(c) Consider a discrete memoryless source {Xi}

i=1 with alphabet X = {a1, a2, a3, a4, a5, a6}
and probability distribution
[p1, p2, p3, p4, p5, p6] =
1
4
,
1
4
,
1
4
,
1
8
,
1
16
,
1
16
,
where pi
:= P(X = ai), i = 1, · · · , 6. The binary first-order Shannon code (described in Lecture 16) for the source is optimal.
(4) Consider a discrete memoryless source {Xn}

n=1 with alphabet X = {a, b, c, d, e, f, g}. Let
p1 and p2 be two possible probability mass functions for the source that are described
in the table below. Also, let C = f(X ) where f : X → {0, 1, 2, 3}

, be a quaternary
(first-order) prefix code for the source as shown below.
Symbol x ∈ X p1(x) p2(x) f(x)
a 1/4 1/4 0
b 1/4 1/8 1
c 1/4 1/8 2
d 1/16 1/8 30
e 1/16 1/8 31
f 1/16 1/8 32
g 1/16 1/8 33
Let R(C, pj ) denote the average compression rate of code C for the source under distribution pj
, i, j = 1, 2.
(a) Calculate R(C, p1), and R(C, p2) and compare them to their respective theoretical
limit (using the appropriate units).
(c) Which distribution is preferable for the source? Comment qualitatively.
(5) Consider a stationary Markov source {Xi}

i=1 with alphabet X = {a, b} and transition
distribution PX2|X1
(a|a) = PX2|X1
(b|b) = α, where 1/3 < α < 1/2.
(a) Design a first-order and a second-order optimal binary variable-length codes for the
source and compare their average coding rates.
(b) Determine the limit of the average coding rate of an n-th order optimal binary
variable-length code for the source as n → ∞. Justify your answer.
(6) We are interested in constructing first-order binary uniquely decodable (UD) and suffix
variable-length codes (recall that a suffix code has the property that none of its codewords
is a suffix of another) for a discrete memoryless source (DMS) {Xi} with alphabet X =
{a1, a2, . . . , aM} and symbol probabilities given by pi
:= PX(ai) > 0, i = 1, 2, . . . , M.
The criterion in designing the codes is that they minimize the following expected cost
function
L¯ =
X
M
i=1
pic(li),
where li
, . . . , lM are the lengths of the codewords for source symbols a1, . . . , aM, respectively, and c : {1, 2, . . .} → (0,∞) is a (not necessarily linear) cost function. Define

S := min
all suffix codes

and

UD := min
all uniquely decodable codes
L. ¯
(a) Compare L¯
S and L¯
UD.
(b) Describe and evaluate appropriate metrics to assess the performance and complexity
of D-ary nth-order UD code designs for a DMS.
(7) Answer the following problems.
(i) A large data set is generated by a stationary source {Xn}

n=1 with finite alphabet
X and joint distribution PXn (x
n
), x
n ∈ X n
, n ≥ 1. However the underlying (true)
distribution PXn , being unknown, the data source is approximated by another stationary source {Xˆ
n}

n=1 with known distribution PXˆ n (x
n
), x
n ∈ X n
, n ≥ 1. (We
assume that PXn (x
n
) > 0 and PXˆ n (x
n
) > 0 for all x
n ∈ X n
.)
For any fixed integer n ≥ 1, we design an n-th order binary prefix code fn : X
n →
{0, 1}

for the source under the model {Xˆ
n} by using Shannon’s assignment rule to
the lengths l(cxn ) of codewords cxn = fn(x
n
), given as follows (cf., Lecture 15):
l(cxn ) = d− log2 PXˆ n (x
n
)e, xn ∈ X n
.
(a) Show that the true average compression rate R¯
n := 1
n
P
xn∈X n PXn (x
n
)l(cxn ) of
the code, computed under the true source distribution PXn satisfies
1
n
H(X
n
) + 1
n
D(PXn kPXˆ n ) ≤ R¯
n <
1
n
H(X
n
) + 1
n
+
1
n
D(PXn kPXˆ n )
and hence estimating the true source distribution PXn via PXˆ n results in a
(inefficiency) cost of 1
nD(PXn kPXˆ n ) in the average compression rate.
(b) Now assume that the approximating source {Xˆ
n} is a stationary Markov source
{Xˆ
n}

n=1 with transition distribution PXˆ2|Xˆ1
(b|a), a, b ∈ X , obtained by estimating the relative frequencies of transitions (between consecutive data symbols) in the data set. Determine limn→∞ R¯
n in terms of the cross-entropies
H(PX1,X2
; PXˆ1,Xˆ2
) and H(PX1
; PXˆ1
).
(ii) [MATH 874 only] Problem 3.2.