Description
1. (15 pts) Bellatrix Lestrange is writing a secret message to Voldemort and wants to
prevent it from being understood by meddlesome young wizards and Muggles. She
decides to use Huffman encoding to encode the message. Magically, the symbol frequencies of the message are given by the Pell numbers, a famous sequence of integers
known since antiquity and related to the Fibonacci numbers. The nth Pell number is
defined as Pn = 2 Pn−1 + Pn−2 for n > 1 with base cases P0 = 0 and P1 = 1.
(a) For an alphabet of Σ = {a, b, c, d, e, f, g, h} with frequencies given by the first
|Σ| non-zero Pell numbers, give an optimal Huffman code and the corresponding
encoding tree for Bellatrix to use.
(b) Generalize your answer to (1a) and give the structure of an optimal code when
the frequencies are the first n non-zero Pell numbers.
2. (30 pts) A good hash function h(x) behaves in practice very close to the uniform hashing
assumption analyzed in class, but is a deterministic function. That is, h(x) = k each
time x is used as an argument to h(). Designing good hash functions is hard, and a
bad hash function can cause a hash table to quickly exit the sparse loading regime by
overloading some buckets and under loading others. Good hash functions often rely
on beautiful and complicated insights from number theory, and have deep connections
to pseudorandom number generators and cryptographic functions. In practice, most
hash functions are moderate to poor approximations of uniform hashing.
Consider the following hash function. Let U be the universe of strings composed of the
characters from the alphabet Σ = [A, . . . ,Z], and let the function f(xi) return the index
of a letter xi ∈ Σ, e.g., f(A) = 1 and f(Z) = 26. Finally, for an m-character string
x ∈ Σ
m, define h(x) = ([Pm
i=1 f(xi)] mod `), where ` is the number of buckets in the
hash table. That is, our hash function sums up the index values of the characters of a
string x and maps that value onto one of the ` buckets.
(a) The following list contains US Census derived last names:
http://www2.census.gov/topics/genealogy/1990surnames/dist.all.last
Using these names as input strings, first choose a uniformly random 50% of these
name strings and then hash them using h(x).
Produce a histogram showing the corresponding distribution of hash locations
when ` = 200. Label the axes of your figure. Briefly describe what the figure
shows about h(x), and justify your results in terms of the behavior of h(x). Do
not forget to append your code.
Hint: the raw file includes information other than name strings, which will need to be removed;
and, think about how you can count hash locations without building or using a real hash table.
1
CSCI 3104
Problem Set 5
(b) Enumerate at least 4 reasons why h(x) is a bad hash function relative to the ideal
behavior of uniform hashing.
(c) Produce a plot showing (i) the length of the longest chain (were we to use chaining
for resolving collisions under h(x)) as a function of the number n of these strings
that we hash into a table with ` = 200 buckets, (ii) the exact upper bound on the
depth of a red-black tree with n items stored, and (iii) the length of the longest
chain were we to use a uniform hash instead of h(x). Include a guide of c n
Then, comment (i) on how much shorter the longest chain would be under a
uniform hash than under h(x), and (ii) on the value of n at which the red-black
tree becomes a more efficient data structure than h(x) and separately a uniform
hash.
3. (15 pts) Draco Malfoy is struggling with the problem of making change for n cents
using the smallest number of coins. Malfoy has coin values of v1 < v2 < · · · < vr for r
coins types, where each coin’s value vi
is a positive integer. His goal is to obtain a set
of counts {di}, one for each coin type, such that Pr
i=1 di = k and where k is minimized.
(a) A greedy algorithm for making change is the cashier’s algorithm, which all
young wizards learn. Malfoy writes the following pseudocode on the whiteboard
to illustrate it, where n is the amount of money to make change for and v is a
vector of the coin denominations:
wizardChange(n,v,r) :
d[1 .. r] = 0 // initial histogram of coin types in solution
while n > 0 {
k = 1
while ( k < r and v[k] > n ) { k++ }
if k==r { return ’no solution’ }
else { n = n – v[k] }
}
return d
Hermione snorts and says Malfoy’s code has bugs. Identify the bugs and explain
why each would cause the algorithm to fail.
(b) Sometimes the goblins at Gringotts Wizarding Bank run out of coins,1 and make
change using whatever is left on hand. Identify a set of U.S. coin denominations
for which the greedy algorithm does not yield an optimal solution. Justify your
1
It’s a little known secret, but goblins like to eat the coins. It isn’t pretty for the coins, in the end.
2
CSCI 3104
Problem Set 5
answer in terms of optimal substructure and the greedy-choice property. (The set
should include a penny so that there is a solution for every value of n.)
(c) (8 pts extra credit) On the advice of computer scientists, Gringotts has announced
that they will be changing all wizard coin denominations into a new set of coins
denominated in powers of c, i.e., denominations of c
0
, c1
, . . . , c`
for some integers
c > 1 and ` ≥ 1. (This will be done by a spell that will magically transmute old
coins into new coins, before your very eyes.) Prove that the cashier’s algorithm
will always yield an optimal solution in this case.
Hint: first consider the special case of c = 2.
3