## Description

1. (15 pts) Bellatrix Lestrange is writing a secret message to Voldemort and wants to

prevent it from being understood by meddlesome young wizards and Muggles. She

decides to use Huffman encoding to encode the message. Magically, the symbol frequencies of the message are given by the Pell numbers, a famous sequence of integers

known since antiquity and related to the Fibonacci numbers. The nth Pell number is

defined as Pn = 2 Pn−1 + Pn−2 for n > 1 with base cases P0 = 0 and P1 = 1.

(a) For an alphabet of Σ = {a, b, c, d, e, f, g, h} with frequencies given by the first

|Σ| non-zero Pell numbers, give an optimal Huffman code and the corresponding

encoding tree for Bellatrix to use.

(b) Generalize your answer to (1a) and give the structure of an optimal code when

the frequencies are the first n non-zero Pell numbers.

2. (30 pts) A good hash function h(x) behaves in practice very close to the uniform hashing

assumption analyzed in class, but is a deterministic function. That is, h(x) = k each

time x is used as an argument to h(). Designing good hash functions is hard, and a

bad hash function can cause a hash table to quickly exit the sparse loading regime by

overloading some buckets and under loading others. Good hash functions often rely

on beautiful and complicated insights from number theory, and have deep connections

to pseudorandom number generators and cryptographic functions. In practice, most

hash functions are moderate to poor approximations of uniform hashing.

Consider the following hash function. Let U be the universe of strings composed of the

characters from the alphabet Σ = [A, . . . ,Z], and let the function f(xi) return the index

of a letter xi ∈ Σ, e.g., f(A) = 1 and f(Z) = 26. Finally, for an m-character string

x ∈ Σ

m, define h(x) = ([Pm

i=1 f(xi)] mod `), where ` is the number of buckets in the

hash table. That is, our hash function sums up the index values of the characters of a

string x and maps that value onto one of the ` buckets.

(a) The following list contains US Census derived last names:

http://www2.census.gov/topics/genealogy/1990surnames/dist.all.last

Using these names as input strings, first choose a uniformly random 50% of these

name strings and then hash them using h(x).

Produce a histogram showing the corresponding distribution of hash locations

when ` = 200. Label the axes of your figure. Briefly describe what the figure

shows about h(x), and justify your results in terms of the behavior of h(x). Do

not forget to append your code.

Hint: the raw file includes information other than name strings, which will need to be removed;

and, think about how you can count hash locations without building or using a real hash table.

1

CSCI 3104

Problem Set 5

(b) Enumerate at least 4 reasons why h(x) is a bad hash function relative to the ideal

behavior of uniform hashing.

(c) Produce a plot showing (i) the length of the longest chain (were we to use chaining

for resolving collisions under h(x)) as a function of the number n of these strings

that we hash into a table with ` = 200 buckets, (ii) the exact upper bound on the

depth of a red-black tree with n items stored, and (iii) the length of the longest

chain were we to use a uniform hash instead of h(x). Include a guide of c n

Then, comment (i) on how much shorter the longest chain would be under a

uniform hash than under h(x), and (ii) on the value of n at which the red-black

tree becomes a more efficient data structure than h(x) and separately a uniform

hash.

3. (15 pts) Draco Malfoy is struggling with the problem of making change for n cents

using the smallest number of coins. Malfoy has coin values of v1 < v2 < · · · < vr for r
coins types, where each coin’s value vi
is a positive integer. His goal is to obtain a set
of counts {di}, one for each coin type, such that Pr
i=1 di = k and where k is minimized.
(a) A greedy algorithm for making change is the cashier’s algorithm, which all
young wizards learn. Malfoy writes the following pseudocode on the whiteboard
to illustrate it, where n is the amount of money to make change for and v is a
vector of the coin denominations:
wizardChange(n,v,r) :
d[1 .. r] = 0 // initial histogram of coin types in solution
while n > 0 {

k = 1

while ( k < r and v[k] > n ) { k++ }

if k==r { return ’no solution’ }

else { n = n – v[k] }

}

return d

Hermione snorts and says Malfoy’s code has bugs. Identify the bugs and explain

why each would cause the algorithm to fail.

(b) Sometimes the goblins at Gringotts Wizarding Bank run out of coins,1 and make

change using whatever is left on hand. Identify a set of U.S. coin denominations

for which the greedy algorithm does not yield an optimal solution. Justify your

1

It’s a little known secret, but goblins like to eat the coins. It isn’t pretty for the coins, in the end.

2

CSCI 3104

Problem Set 5

answer in terms of optimal substructure and the greedy-choice property. (The set

should include a penny so that there is a solution for every value of n.)

(c) (8 pts extra credit) On the advice of computer scientists, Gringotts has announced

that they will be changing all wizard coin denominations into a new set of coins

denominated in powers of c, i.e., denominations of c

0

, c1

, . . . , c`

for some integers

c > 1 and ` ≥ 1. (This will be done by a spell that will magically transmute old

coins into new coins, before your very eyes.) Prove that the cashier’s algorithm

will always yield an optimal solution in this case.

Hint: first consider the special case of c = 2.

3