Description
Question 1: AVL Trees.
AVL Trees are yet another self balancing binary search tree (BST) that are sometimes used in the place of red black trees. The key property of an AVL tree is that
for all nodes nn in the tree, | height(n.left)−height(n.right)|≤1| height(n.left)−height(n.right)|≤1
In words, the height of the left subtree and right subtree at any node can differ by at most 11.
Let hh be the height of an AVL tree and nn be the number of nodes in the tree. The goal of this problem is to prove a relationship between hh and nn. We’ve broken this into two steps:
(A) Prove that n≥Fhn≥Fh, where FhFh is the hthhth Fibonacci number. (F0=1,F1=1,F2=2,…F0=1,F1=1,F2=2,…) (Hint Use strong induction with two base cases. First establish the property for all AVL trees of heights 0 and 1. Next, assuming it holds for trees of height ≤h≤h, prove it for trees of height h+1h+1 ).
Next, it is a fact that for any k≥30k≥30, Fk≥1.5kFk≥1.5k.
(B) Using the above fact and the result from part A, show that h=Θ(log(n))h=Θ(log(n)).
(C) We will briefly examine inserting a node into an AVL tree through an example. On the left, we have shown an AVL tree and to the right we show the result after a BST insert has happened.
Devise a sequence of left and right rotations that will restore the AVL tree property. Explain for each rotation what is the root node at which we are rotating and which direction. If you wish, you may insert images showing the trees before/after rotation using markdown (see how we inserted the image. But do not forget to upload the images with the submission).
Answer 1 (Expected length: 15 lines)
(A) Your answer here
(B) Your answer here
(C) Your answer here
Question 2: Bloom Filters
A bloom filter is a fast set data structure that maintains a set SS of keys. One can insert keys into the set and test whether a given key kk belongs to the set. It may used in applications where the keys are “complicated” objects such as TCP packets or images that are expensive to compare with each other.
The data structure is an array TT of Booleans size mm with ll different hash functions h1,…,hlh1,…,hl. Initially, T[i] = FALSE
for all i
.
If a key kk is to be inserted we first compute i1=h1(k),…,il=hl(k)i1=h1(k),…,il=hl(k) and then we set T[i1]=⋯T[il]=TRUET[i1]=⋯T[il]=TRUE.
Note: A bloom filter is not a hash table, but they both use hash functions in interesting ways.
(A) Suppose we wish to find out if an element kk is a member of the set by checking if T[h1(k)],…,T[hl(k)]T[h1(k)],…,T[hl(k)] are all true. Explain whether this can lead to a false positive i.e, the approach wrongly concludes that kk belongs to the set when it was never inserted; or false negative i.e, the approach wrongly concludes that kk does not belong to the set when it does.
(B) Suppose our hash functions are guaranteed to be uniform. I.e, for any randomly chosen key kk, for any hash function hihi and cell jj,
If nn keys are chosen at random and inserted into the filter, compute that probability that any given cell T[j]T[j] is set to FALSE after this.
(C) Use the results from previous set to estimate the probabilisty of a false positive. I.e, some ll cells i1,i2,…,ili1,i2,…,il are simultaneously set to TRUE.
Answer 2 { Expected Size: 15 lines}
(A) Your answer here
(B) Your answer here
(C) Your answer here