Description
1 Question 1 – Support Vector Machines (20 points)
1.1 Linear case (10 points)
Consider training a linear SVM on linearly separable dataset consisting of n points. Let m be the number of
support vectors obtained by training on the entire set. Show that the LOOCV error is bounded above by m
n
.
Hint: Consider two cases: (1) removing a support vector data point and (2) removing a non-support vector
data point.
1.2 General case (10 points)
Now consider the same problem as above. But instead of using a linear SVM, we will use a general kernel.
Assuming that the data is linearly separable in the high dimensional feature space corresponding to the
kernel, does the bound in previous section still hold? Explain why or why not.
2 Question 2 – XGBoost (30 points)
In this question, you will use XGBoost to predict the income of a person. The data for this question is under
HW4 q2.
You can use the implementation from https://github.com/dmlc/xgboost. To install it, use pip install xgboost and then import it using from xgboost import XGBClassifier. Here is a tutorial on how to install
and use it: https://machinelearningmastery.com/develop-first-xgboost-modelpython-scikit-learn/.
You will use the Census Income Dataset (https://archive.ics.uci.edu/ml/datasets/census+
income). This dataset includes attributes of 48842 people, such as their age and education. The task is to
predict whether their income exceeds $50K per year. For each person, there are 14 attributes in total. Some
attributes are categorical, like the education, and other attributes have continuous values, like the age. Review
the attributes and their possible values at the link provided above.
The first step is to load your training and test data that are provided in the files adult.data and adult.test
correspondingly. (Tip: To load the data, you can use numpy.genfromtxt). Inspect your data carefully:
The first 14 columns of each file correspond to the features (attributes) of the persons. The last column
corresponds to the labels (income ≤ 50K or > 50K). Since you have both categorical and continuous
attributes, you will first need to load the data as strings. Use “dtype=np.object” to load them to numpy
arrays.
Then, you need to convert them to float. Keep the continuous attributes as they are. For the categorical attributes, you will need to convert them to integers. To do that, you can use sklearn.preprocessing.OrdinalEncoder.
1
For each column that corresponds to a categorical attribute, you can construct a new encoder that converts its
values to integers. For example, the education attribute can be encoded as “Bachelors” = 0, “Masters” = 1,
“Doctorate” = 2, etc. Same for the labels column (e.g. “≤ 50K” = 0 and “> 50K” = 1). At the end, convert
your whole feature matrix and labels to float.
Important: Be careful when you convert the categorical attributes and labels of your test data. They need to
correspond to the same integer values as in your training data. For example, if “Bachelors” corresponds to 0
in your training data, then it should also correspond to 0 in your test data. To ensure that, for each column of
your training data, you can use the fit transform method of sklearn.preprocessing.OrdinalEncoder and then
use the transform method for the corresponding column of your test data.
2.1 Question 2.1 (10 points)
Using the whole training set, train an XGBoost classifier and and test it on the test set. Report the accuracy
and the confusion matrix on both the train and test sets.
2.2 Question 2.2 (20 points)
Perform k-fold cross-validation to find the best set of hyper-parameters for XGBoost. To do that, you will
need to split your training data (X) to train (Xtrain) and validation (Xval) sets. For each fold (split), you will
train a model on the training data Xtrain (subset of the original training data X) and test it on the validation
set Xval. Use k = 10 folds. You can use sklearn.model selection.KFold.
Report the best cross-validation accuracy. Use this set of hyper-parameters and train XGBoost on the entire
training data, and report the accuracy and confusion matrix on the test data.
3 Question 3 – SVM for object detection (50 points + 15 bonus points)
In this question, you will train an SVM and use it for detecting human hands in images. You can use the
SVM implementation of sklearn. For this question, it is sufficient and also much faster to use linear SVM
(sklearn.svm.LinearSVC), but you can also experiment with non-linear kernel using sklearn.svm.SVC (it
might take long time though).
As features, you will use deep features extracted from detectron2
https://github.com/facebookresearch/detectron2. We provide the utility functions for
feature extraction.
To detect human hands in images, we need a classifier that can distinguish between hand image patches from
non-hand patches. To train such a classifier, we can use SVMs. The training data is typically a set of images
with bounding boxes of the hands. Positive training examples are image patches extracted at the annotated
locations. A negative training example can be any image patch that does not significantly overlap with
the annotated hands. Thus there potentially many more negative training examples than positive training
examples. Due to memory limitation, it will not be possible to use all negative training examples at the
same time. In this question, you will implement hard-negative mining to find hardest negative examples and
iteratively train an SVM.
3.1 Preparation
3.1.1 For Conda environment
# Assume you have download the homework file and store in director hw4.
cd hw4
# C r e at e new e n vi r o n m e nt
c o n da c r e a t e −n c s e 5 1 2 s p ri n g 2 1 h w 4 p yt h o n = 3. 7
2
c o n da a c t i v a t e c s e 5 1 2 s p ri n g 2 1 h w 4
# I n s t a l l p y t o r c h
c o n da i n s t a l l p y t o r c h t o r c h v i s i o n c u d a t o o l k i t = 1 0. 0 −c p y t o r c h
# I n s t a l l d e p e n d e n c y l i b r a r i e s
p yt h o n −m pi p i n s t a l l opencv − p yt h o n m e a n a v e r a g e p r e c i s i o n
p yt h o n −m pi p i n s t a l l p y c o c o t o o l s s c i k i t − l e a r n
# I n s t a l l d e t e c t r o n
g i t c l o n e h t t p s : / / g i t h u b . com / f a c e b o o k r e s e a r c h / d e t e c t r o n 2 . g i t \
−− b r a n c h v 0 . 1 . 1 d e t e c t r o n 2 v 0 . 1 . 1
cd d e t e c t r o n 2 v 0 . 1 . 1
g i t c h e c k o ut db1614e
p yt h o n −m pi p i n s t a l l −e .
# Move HW4 q3 i n t o d e t e c t r o n 2 v 0 . 1 . 1
mv . . / HW4 q3 .
cd HW4 q3
3.1.2 For Google Colab
For this question, you will need to use a GPU. You can use Google Colab. If so, remember to change the
runtime type to GPU. Then, install the following prerequisites:
! pi p i n s t a l l m e a n a v e r a g e p r e c i s i o n
! g i t c l o n e h t t p s : / / g i t h u b . com / f a c e b o o k r e s e a r c h / d e t e c t r o n 2 . g i t −− b r a n c h
v0 . 1 . 1 d e t e c t r o n 2 v 0 . 1 . 1
%cd d e t e c t r o n 2 v 0 . 1 . 1
! g i t c h e c k o ut db1614e
! pi p i n s t a l l −e .
Inside the detectron2 v0.1.1 directory, unzip the given HW4 q3.zip:
! u n zi p HW4 q3 . z i p
%cd HW4 q3
3.1.3 Data download
Download the ContactHands dataset and put it inside the HW4 q3/ directory from http://vision.cs.
stonybrook.edu/˜supreeth/ContactHands_data_website/ or by running:
! wget h t t p s : / / p u b l i c . v i n a i . i o / C o nt a ctH a n d s . z i p
! u n zi p C o nt a ctH a n d s
The file ContactHands/README.md provides useful information regarding the structure of this dataset. For
more information about the dataset, see:
‘Detecting Hands and Recognizing Physical Contact in the Wild.’ S. Narasimhaswamy, T. Nguyen, M. Hoai.
Advances in Neural Information Processing Systems (NeurIPS), 2020.
3
3.1.4 Data split
Under HW4 q3/sets/ you can find the data split that you will use for this question: train.txt corresponds
to the training set, validation.txt corresponds to the validation set, extra-train.txt corresponds
to more data for training (optional), and test.txt corresponds to the test set. Copy those files under
ContactHands/ImageSets/Main/.
3.1.5 Annotations
Under HW4 q3/Annotations/ you can find the annotations that you will use for this question. The folder
contains annotations for the training, validation and extra-train data. The annotations for the test data will
not be released, but they will be used for testing your final submission result. Copy the Annotations/ folder
and replace the ContactHands/Annotations/.
3.2 Helper functions
To help you, a number of utility functions and classes are provided in HW4 q3/. The most important functions are in hw4 utils.py:
1. Run python hw4 utils.py -va to visualize some annotated samples.
2. Use get pos and random neg() to get initial training/validation data (dataset = ‘train’ or dataset
= ‘validation’ correspondingly). This function returns the training/validation feature matrix D, the
corresponding training/validation labels lb. There are 2 classes: positive (1) and negative (-1) class.
Positive instances are deep features extracted at the locations of hands. Negative instances are deep
features at random locations of the images. Important: You first need to initialize feat extractor
= prepare second stream() before calling this function.
3. Use detect() to run the sliding window detector. This returns a numpy array of bounding box
locations and corresponding SVM scores. This function can be used for detecting hands in an image.
It can also be used to find hardest negative examples in an image.
4. Use generate result file() to generate a result file (dataset = ‘validation’ or dataset = ‘test’
for the validation or test set correspondingly). Set the argument num img to run the detection for a
subset of test images (e.g. num img=100).
5. Use compute mAP() to compute the Average Precision for the result file.
6. Use get iou() to compute the overlap between two rectangular regions. The overlap is defined as
the area of the intersection over the area of the union. A returned detection region is considered correct
(true positive) if there is an annotated hand such that the overlap between the two boxes is more than
0.5.
7. Some useful OpenCV functions to work with images are: imread, imshow, imresize.
In addition, detect.py includes the feature extraction using the detectron2.
3.3 What to implement
1. (15 points) Use the get pos and random neg() function to get the training data and train an
SVM classifier clf. You can use the sklearn.svm.LinearSVC. Since you have a large number of data,
you can limit the maximum number of iterations (e.g. max iter=1000).
Use the trained classifier to generate a result file (use generate result file()) for the validation data. Then, run the compute mAP() to compute the AP and plot the precision recall curve.
Submit your AP and precision recall curve on the validation data.
4
Algorithm 1 Hard negative mining algorithm
P osD ← all annotated hands
NegD ← random image patches
(w, b) ← trainSVM(P osD, NegD)
for iter = 1, 2, · · · do
A ← All non support vectors in NegD.
B ← Hardest negative examples . Run UB detection and find negative patches that
. violate the SVM margin constraint the most
NegD ← (NegD \ A) ∪ B.
(w, b) ← trainSVM(P osD, NegD)
end for
2. Implement hard negative mining algorithm given in Algorithm 1. Positive training data and random
negative training data can be generated using the get pos and random neg() function. At each
iteration, you should remove negative examples that do not correspond to support vectors from the
negative set. Use the function detect() on train images to identify hardest negative examples and
include them in the negative training set.
Hints: (1) a negative example should not have significant overlap with any annotated hand. You
can experiment with different threshold but 0.3 is a good starting point. (2) you should compute the
objective value at each iteration; the objective values should not decrease. (3) to speed up you can write
a modified version of detect() that uses a different set of bounding box proposals for training.
3. (20 points) Run the negative mining for 10 iterations. Assume your computer is not so powerful
and so you cannot add more than 10000 new negative training examples at each iteration. Record
the objective values (on train data) and the APs (on validation data) through the iterations. Plot the
objective values. Plot the APs. On the validation data, you can also use get pos and random neg
to sample 10 negative patches per validation image. To calculate AP, use
sklearn.metrics.average precision score.
4. (15 points) For this question, you will need to generate a .npy result file for the test data using the
function generate result file(). You will need to submit this file by uploading to https:
//forms.gle/Y5qzA6Mi5Sz5SB2u9 to receive the AP on test data. Report the AP in your
answer file. Important Note: You MUST use your Stony Brook ID as the name of your submission
file, i.e., your SBU ID.npy (e.g., 012345679.npy). Your submission will not be evaluated if you
don’t use your SBU ID. For this question, you don’t need to have the highest AP to earn full marks.
5. (15 bonus points) Your submitted result file for test data will be automatically entered in a competition
for fame (https://bit.ly/31L9Cov). We will maintain a leader board and the top three entries
at the end of the competition (due date) will receive 15, 10, and 5 bonus points. The ranking is based
on AP.
You can submit the result as frequent as you want. However, the evaluation server will only evaluate all
submissions two times a day, at 09:00am and 09:00pm. The system only keeps the recent submission
file, and your new submission will override the previous ones. Therefore, you have two chances a day
to evaluate your method.
You are allowed to use any feature types and classifiers for this part of the homework. In addition, you
are allowed to fine-tune any part of the given code. For example, you can try tuning the sliding window
detection, e.g., try different image scales, window sizes and strides. You can use more training data.
5
You can run hard negative mining algorithm for as many iterations as you want, and the number of
negative examples added at each iteration is not limited by 10000. You can train with all available data,
including “train”, “validation”, “extra-train”. You can also use data from other datasets. For example,
see https://www3.cs.stonybrook.edu/˜cvl/projects/hand_det_attention/.
Check the following papers for the state-of-the-art performance on hand detection. If your method
significantly outperforms these papers, we invite you to write a paper with us! Please email us directly
if you think you have an awesome technique that obtains good results.
‘Detecting Hands and Recognizing Physical Contact in the Wild.’ S. Narasimhaswamy, T. Nguyen,
M. Hoai. Advances in Neural Information Processing Systems (NeurIPS), 2020.
‘Contextual Attention for Hand Detection in the Wild.’ S. Narasimhaswamyy, Z. Wei, Y. Wang, J.
Zhang, M. Hoai. Proceedings of International Conference on Computer Vision (ICCV), 2019.
4 What to submit
You will need to submit both your code and your answers to questions on Blackboard. Put the answer file
and your python code in a folder named: SUBID FirstName LastName (e.g., 10947XXXX Barack Obama).
Zip this folder and submit the zip file on Blackboard. Your submission must be a zip file, i.e,
SUBID FirstName LastName.zip.
The answer file should be named: hw4-answers.pdf. You can use Latex if you wish, but it is not compulsory.
The first page of the hw4-answers.pdf should be the filled cover page at the end of this homework. The
remaining of the answer file should contain answers to Questions 1, 2, 3.
Your Python code for Questions 2 and 3 can be in separate notebooks or python files. Make sure that the
name of each file is self-explanatory.
Make sure you follow the instructions carefully.
5 Cheating warnings
Don’t cheat. You must do the homework yourself, otherwise you won’t learn. You cannot ask and discuss
with students from previous years. You cannot look up the solution online.
6
Cover page for answers.pdf
CSE512 Spring 2021 – Machine Learning – Homework 4
Your Name:
Solar ID:
NetID email address:
Names of people whom you discussed the homework with: