Description
1 Introduction There are two main goals of this homework: 1. To introduce you to the MS-COCO dataset, which stands for Microsoft Common Objects in COtext. You will be tasked to create your own classification dataset based on images and annotations taken from the COCO dataset. 2. To have you create your own Convolutional Neural Network (CNN) in PyTorch for image classification. 2 Background 2.1 About the COCO Dataset Owing to its rich annotations, the COCO dataset has been used extensively by researchers for training and evaluating their frameworks for image classification, object detection, self-supervised learning, pose estimation, etc. To understand the motivations behind its creation and what makes it challenging, you should go ahead and download the paper that introduced the COCO dataset by clicking on: https://arxiv.org/pdf/1405.0312.pdf You should at least read the Introduction section of the paper. In this homework, you will be asked to create your own image dataset for classification. For any aspiring machine learning practitioner, the ability to curate data and labels that fit his or her own needs is one of the most important skills to have. Therefore, we have designed this homework to provide an introductory exercise for this very purpose. More specifically, you will be downloading a part of the full COCO dataset. Then, you will familiarize 1 yourself with the COCO API, which provides a convenient interface to the otherwise complicated annotation files. And lastly, you will create your own image dataset for classification using the downloaded COCO files and the COCO API. 2.2 About the Image Classification Network You Will Write Code For A good starting point for you would be to review the networks you see in the inner class ExperimentsWithCIFAR of the DLStudio module. The network you will be creating is likely to be very similar to the two examples — Net and Net2 — shown in that section of DLStudio. After installing DLStudio, play with the two networks by changing the parameters of the convolutional and the fully connected layers and see what that does to the classification accuracy. For that, you will need to execute the following script in the Examples directory of the DLStudio module1 : playing_with_cifar10.py As with the DLStudio example code mentioned above, the classification network you will create will use a certain number of convolutional layers and, at its top, will contain one or more fully connected (FC) layers (also known as Linear layers). The number of output nodes at the final layer will be 10, which is equal to the number of image classes you will be working with. In your experiments with the classification network, pay attention to the changing resolution in the image tensor as it is pushed up the resolution hierarchy of a CNN. This is particularly important when you are trying to estimate the number of nodes you need in the first fully connected layer at the top of the network. Depending on the sizes of the convolutional kernels you will use, you may also need to pay attention to the role played by padding in the convolutional layers. 1The CIFAR-10 dataset will be downloaded automatically by running this script. The CIFAR image dataset, made available by the University of Toronto, is considered to be the fruit-fly of DL. The dataset consists of 32 × 32 images, 50,000 for training and 10,000 for testing that can easily be processed in your laptop. Just Google “download CIFAR-10 dataset” for the website from where you can download the dataset. 2 3 Programming Tasks 3.1 Creating Your Own Image Classification Dataset In this exercise, you will create your own dataset based on the following steps: 1. The first step is to install the COCO API in your conda environment. The Python version of the COCO API — pycocotools provides the necessary functionalities for loading the annotation JSON files and accessing images using class names. The pycocoDemo.ipynb demo available on the COCO API GitHub repository [1] is a useful resource to familiarize yourself with the COCO API. You can install the pycocotools package with the following command: conda install -c conda-forge pycocotools 2. Now, you need to download the image files and their annotations. The COCO dataset comes in 2014 and 2017 versions. For this homework, you will be using the 2014 Train images. You can download them directly from this page: https://cocodataset.org/#download On the same page, you will also need to download the accompanying annotation files: 2014 Train/Val annotations. Unzip the two archives you just downloaded. 3. You main task is to use those files to create your own image classification dataset. Note that you can access the class labels of the images stored in the instances_train2014.json file using the COCO API. You have total freedom on how you organize your dataset as long as it meets the following requirements: • It should contain 1500 training and 500 validation images for each of the following five classes: [ ’airplane’, ’bus’, ’cat’, ’dog’, ’pizza’] This will amount to 7.5k training images and 2.5k validation images in total and there should be no duplicates. All images should be taken from the 2014 Train images set you just downloaded. 3 • When saving your images to disk, resize them to 64 × 64. You can use the PIL module for that. 4. In your report, make a figure of a selection of images from your created dataset. You should plot at least 3 images from each of the five classes. 3.2 Image Classification using CNNs – Training and Validation Once you have prepared the dataset, you now need to implement and test the following CNN tasks: CNN Task 1: In the following network, you will notice that we are constructing instances of torch.nn.Conv2d in the mode in which it only uses the valid pixels for the convolutions. But, as you now know based on the Week 5 lecture (and slides), this is going to cause the image to shrink as it goes up the convolutional stack. Your first task is to run the network as shown. Let’s call this single layer CNN as Net1 . 1 class HW4Net ( nn . Module ): 2 def __init__ ( self ): 3 super ( HW4Net , self ) . __init__ () 4 self . conv1 = nn . Conv2d (3 , 16 , 3 ) 5 self . pool = nn . MaxPool2d (2 , 2 ) 6 self . conv2 = nn . Conv2d ( 16 , 32 , 3 ) 7 self . fc1 = nn . Linear ( XXXX , 64 ) 8 self . fc2 = nn . Linear ( 64 , XX ) 9 10 def forward ( self , x ): 11 x = self . pool ( F . relu ( self . conv1 ( x ) ) ) 12 x = self . pool ( F . relu ( self . conv2 ( x ) ) ) 13 x = x . view ( x . shape [0], -1 ) 14 x = F . relu ( self . fc1 ( x ) ) 15 x = self . fc2 ( x ) 16 return x Note that the value for XXXX will vary for each CNN architecture and finding this parameter for each CNN is your homework task. XX denotes the number of classes. In order to experiment with a network like the one shown above, your training routine can be as simple as: 1 net = net . to ( device ) 2 criterion = torch . nn . CrossEntropyLoss () 3 optimizer = torch . optim . Adam ( 4 4 net . parameters () , lr=1e-3 , betas =( 0 .9 , 0 . 99 ) ) 5 epochs = 7 6 for epoch in range ( epochs ): 7 running_loss = 0 . 0 8 for i , data in enumerate ( train_data_loader ): 9 inputs , labels = data 10 inputs = inputs . to ( device ) 11 labels = labels . to ( device ) 12 optimizer . zero_grad () 13 outputs = net ( inputs ) 14 loss = criterion ( outputs , labels ) 15 loss . backward () 16 optimizer . step () 17 running_loss += loss . item () 18 if (i+1 ) % 100 == 0: 19 print (“[ epoch : %d, batch : %5d] loss : %.3f” \ 20 % ( epoch + 1 , i + 1 , running_loss / 100 ) ) 21 running_loss = 0 . 0 where the variable net is an instance of HW4Net. CNN Task 2: In the HW4Net class as shown, we used the class torch.nn. Conv2d class without padding. In this task, construct instances of this class with padding. Specifically, add a padding of one to the all the convolutional layers. Now calculate the loss again and compare with the loss for the case when no padding was used. This is the second CNN architecture, Net2 for this homework. CNN Task 3: So far, both Net1 and Net2 can be only considered as very shallow networks. Now in this task, we would like you to experiment with a deeper network. Modify the HW4Net class to chain at least 10 extra convolutional layers between the second conv layer and the first linear layer. Each new convolutional layer should have 32 in-channels, 32 out-channels, a kernel size of 3 and padding of 1. In the forward () method, the output of each conv layer should be fed through an activation function before passed into the next layer. Note that you would also need to update the value of XXXX accordingly. The resulting network will be the third CNN architecture — Net3 . Note that in order to train and evaluate your CNNs, you will need to implement your own torch.utils.data.Dataset and DataLoader classes for loading the images and labels. This is similar to what you have implemented in HW2. For evaluating the performance of your CNN classifier, you need to write your own code for calculating the confusion matrix. For the dataset that you 5 (a) Training loss for the three CNNs. (b) Sample confusion matrix. Figure 1: Sample output, training loss and validation confusion matrix. The plotting options are flexible. Your results could vary based on your choice of hyperparamters. The confusion matrix shown is for a different dataset and is for illustration only. created, your confusion matrix will be a 5×5 array of numbers, with both the rows and the columns standing for the 5 classes in the dataset. The numbers in each row should show how the test samples corresponding to that class were correctly and incorrectly classified. You might find scikit-learn and seaborn python packages useful for this task. Fig. 1b shows a sample plot of the training loss and a sample confusion matrix. It’s important to note that your own plots could vary based on your choice of hyperparameters. In your report, you should include a figure that plots the training losses of all three networks together. Additionally, include the confusion matrix for each of the three networks on the validation set. Finally, include your answers to the following questions: 1. Does adding padding to the convolutional layers make a difference in classification performance? 2. As you may have known, naively chaining a large number of layers can result in difficulties in training. This phenomenon is often referred to as vanishing gradient. Do you observe something like that in Net3 ? 3. Compare the classification results by all three networks, which CNN do you think is the best performer? 4. By observing your confusion matrices, which class or classes do you think are more difficult to correctly differentiate and why? 6 5. What is one thing that you propose to make the classification performance better? 4 Submission Instructions Include a typed report explaining how did you solve the given programming tasks. 1. Your pdf must include a description of • The figures and descriptions as mentioned in Sec. 3. • Your source code. Make sure that your source code files are adequately commented and cleaned up. 2. Turn in a zipped file, it should include (a) a typed self-contained pdf report with source code and results and (b) source code files (only .py files are accepted). Rename your .zip file as hw3 .zip and follow the same file naming convention for your pdf report too. 3. Do NOT submit your network weights. 4. For all homeworks, you are encouraged to use .ipynb for development and the report. If you use .ipynb, please convert it to .py and submit that as source code. 5. You can resubmit a homework assignment as many times as you want up to the deadline. Each submission will overwrite any previous submission. If you are submitting late, do it only once on BrightSpace. Otherwise, we cannot guarantee that your latest submission will be pulled for grading and will not accept related regrade requests. 6. The sample solutions from previous years are for reference only. Your code and final report must be your own work. 7. To help better provide feedbacks to you, make sure to number your figures. References [1] COCO API – http://cocodataset.org/. URL https://github.com/ cocodataset/cocoapi. 7