## Description

The files faces.tar.gz and background.tar.gz contain 2000 images of faces and non-face image

patches, respectively. Your task in this assignment is to implement a Viola–Jones type boosting based

detector cascade, as described in the lectures, to distinguish between these two classes, and hence detect

faces in larger images. We provided a standard test image to test your algorithm. To detect the faces in the

image, use a sliding window to evaluate your trained detector on each 64 × 64 patch. Report your results

either by giving the coordinates of each patch where you detected a face, or, better yet, by drawing a square

around the patch. You might find that you get many overlapping detections around each face. To avoid

this problem use an exclusion rule to avoid overlaps. Please include a detailed writeup in your submission

explaining any design choices that you have made, how many classifiers you have in your cascade, what the

training error of each classifier is, and how long it took to train each stage of the cascade.

You have a lot of freedom in how exactly to implement the algorithm. However, if you’re having trouble

getting started, we suggest implementing the following functions. These are merely suggestions and you

should feel free to deviate from our prescribed steps if you think you have a more convenient approach.

These function signatures are written in Python but you may choose to use whatever language you’re most

comfortable with.

1. Write a function to load all the data into a matrix

def load_data(faces_dir, background_dir):

’’’

faces_dir: string, file location of the faces directory

background_dir: string, file location of the background directory

Returns: a tuple of numpy matrices

– a matrix of size N x 64 x 64 of the images

– a matrix of size N x 1 of the class labels(+1 for a face, -1 for background)

’’’

You may find it helpful to use library functions from scipy or imageio to load the jpeg files into a

numpy matrix. Note that the images are in RGB format so once you load them, each image will be a

64 × 64 × 3 matrix. We won’t need the color, so convert the image to greyscale. You can do this by

averaging each pixel over the color index(last index of the matrix).

2. Recall that all of the Haar filters can be computed very quickly if you already have the integral image

representation pre-computed. Implement a function that computes this integral image representation.

The output matrix should satisfy:

Outputi,x,y =

∑x

a=1

∑y

b=1

Inputi,a,b, for all images i, 1 ≤ a, b ≤ 64

def compute_integral_image(imgs):

’’’

imgs: numpy matrix of size N x 64 x 64, where N = total number of images

Returns: a matrix of size N x 64 x 64 of the integral image reprsentation

’’’

3. Construct a list that will hold the relevant information for each Har-filter feature. In the boosting slides

we showed a whole assortment of Haar filters that you could potentially use, but for this assignment

just the 2-rectangles will be sufficient. A 2-rectangle feature is uniquely determined by:

1

(a) pixel location (xs, ys) of the upper left corner of the shaded rectangle

(b) pixel location (x

′

s

, y′

s

) of the bottom right corner of the shaded rectangle

(c) pixel location (xu, yu) of the upper left corner of the unshaded rectangle

(d) pixel location (x

′

u

, y′

u

) of the bottom right corner of the unshaded rectangle

Remember that once you have the integral image reprsentation computed, we just need the pixel

locations of the relevant corners of this filter to compute the corresponding feature on an image.

def feature_list():

’’’

Returns: list of relevant pixel locations for each feature.

The ith index of this list should be a list of the four relevant

pixel locations needed to compute the ith feature

’’’

4. Implement a function that computes the features. Given a feature index (an index into the datastructure computed in the previous step), you have the necessary information to evaluate this feature on a

given image.

def compute_feature(int_img_rep, feat_lst, feat_idx):

’’’

int_img_rep: the N x 64 x 64 numpy matrix of the integral image representation

feat_lst: list of features

feat_idx: integer, index of a feature (in feat_lst)

Returns: an N x 1 numpy matrix of the feature evaluations for each image

’’’

5. At each iteration of AdaBoost, we pick the best weak learner that minimizes the weighted training

error. Here, our weak learners are given by {hi,p,θ} where i is a feature index, p ∈ {+1, −1}, and θ ∈ R.

In lecture, we saw an efficient way of computing the optimal p and θ values for for a fixed feature index

i (see slide 27/31 of the boosting slides). Implement a function that performs this computation.

def opt_p_theta(int_img_rep, feat_lst, weights, feat_idx):

’’’

int_img_rep: the N x 64 x 64 numpy matrix of the integral image representation

feat_lst: list of features

weights: an N x 1 matrix containing the weights for each datapoint

feat_idx: integer, index of the feature to compute on all of the images

Returns: the optimal p and theta values for the given feat_idx

’’’

Implement a function that computes the predictions of a given weak learner:

def eval_learner(int_img_rep, feat_lst, feat_idx, p, theta):

’’’

int_img_rep: the N x 64 x 64 numpy matrix of the integral image representation

feat_lst: list of features

feat_idx: integer, index of the feature for this weak learner

p: +1 or -1, polarity

theta: float, threshold

Returns: N x 1 vector of label predictions for the given weak learner

’’’

2

Write a function to compute the weighted error rate of a given weak learner:

def error_rate(int_img_rep, feat_lst, weights, feat_idx, p, theta):

’’’

int_img_rep: the N x 64 x 64 numpy matrix of the integral image representation

feat_lst: list of features

weights: an N x 1 matrix containing the weights for each datapoint

feat_idx: integer, index of the feature for this weak learner

p: +1 or -1, polarity

theta: float, threshold

Returns: the weighted error rate of this weak learner

’’’

Finally, implement a function that finds the optimal weak learner hi,p,θ over the training data. This

function should be fairly simple given the previous functions:

def opt_weaklearner(int_img_rep, weights, feat_lst):

’’’

int_img_rep: the N x 64 x 64 numpy matrix of the integral image representation

weights: N x 1 matrix of weights of each datapoint

feat_lst: list of features

Returns: the i, p, and theta values for the optimal weak learner

’’’

6. Implement a function to update the weights of each datapoint. Recall that the update rule is given by:

Dt+1(i) = Dt(i) exp(−αtyi

, ht(xi))

Zt

αt =

1

2

ln1 − ϵt

ϵt

Zt = 2√

ϵt(1 − ϵt)

def update_weights(weights, error_rate, y_pred, y_true):

’’’

weights: N x 1 matrix containing the weights of each datapoint

error_rate: the weighted error rate

y_pred: N x 1 matrix of predicted labels

y_true: N x 1 matrix of true labels

Returns: N x 1 matrix of the updated weights

’’’

At this point you should have most of the helper functions you’d want to perform the main components

of the AdaBoost algorithm. Remember, this is just a set of guidelines. It is not an exhaustive list of

the functions you’ll need to have a working implementation of the Viola Jones face detector.

In addition, here are a few tips:

1. Use the Haar-like features described in the Viola–Jones paper and simple decision stumps (with varying

threshold) as weak learners. Each round of boosting will select a single (i, p, θ) combination (where i

indexes the feature, p ∈ {−1, +1} is the polarity, and θ is the threshold) to add to your classifier.

2. The original Viola–Jones paper used three distinct types of Haar–like features. For the purposes of this

assignment using just the simplest “two-rectangle” features is probably sufficient. However, for extra

credit you might implement the other features as well.

3

3. The hypothesis returned by regular AdaBoost is h(x) = sgn(∑T

t=1 αtht(x)

)

. However, in a classifier

cascade it is critical that each classifier have a low false negative rate. Therefore, we suggest that you

instead use h(x) = sgn(∑T

t=1 αtht(x) − Θ

)

, where Θ is set so that there are no missed detections (false

negatives) on your training date at all.

4. A single booster with such an artificially low false negative rate will have a fairly high false positive

rate. We suggest that you keep adding new features until the false positive rate falls below about 30%.

You will probably find that less than a dozen rounds a boosting are sufficient for this. Chaining several

such boosters together will push the false detection rate of the entire cascade down to acceptable levels.

5. One key idea in the Viola–Jones paper is that it is feasible to have a large number of base learners (Haar

features) as long as the base learners are very fast to compute. These fetures should therefore always

be computed on the fly (from the integral image representations) rather than computed once and then

stored.

6. Speed might nonetheless be an issue in your implementation. We suggest that in the development stage

you work with smaller data, and then scale up to the entire dataset on the final “production run”. In

addition, don’t be afraid to (a) reduce the number of features by using a stride of size 2 or 4 as opposed

to trying every Haar-like feature in every possible pixel location (b) reduce the training set, by training

on just 1000 or even less images from each class. Try and write code that is relative efficient — that

can also make a big difference.

7. Depending on the efficiency of your implementation you might want to start with smaller number of

features than currently used in the provided code. One way to achieve that is to use Haar features of

larger size (i.e. window size larger than 2 × 1 and 1 × 2) and/or larger strides.

4