Description
1 Predictions using Human-AI team
This assignment is based on the following research article appeared in 34th edition of Advances in
Neural Information Processing Systems (NeurIPS 2021).
Kerrigan, G., Smyth, P., Steyvers, M. (2021). Combining Human Predictions with
Model Probabilities via Confusion Matrices and Calibration. Advances in Neural
Information Processing Systems, 34.
This paper has been provided as part of this assignment. In this paper, the authors proposed a
set of algorithms that combine the probabilistic output of a machine learning model with the human
predictions on the same input. This paper shows how the accuracy of predictions are driven by the
confidence of machine learning model in its output. Empirical studies of the proposed model have
been performed using image classification task with CIFAR-10 and a subset of ImageNet datasets.
The authors have released code for this project here https://github.com/gavinkerrigan/conf_
matrix_and_calibration
You are allowed to use the available code of this paper to complete this assignment.
1
2 The Tasks
For the purpose of running experiments, you may only consider CIFAR-10 dataset.
Q.1. Reproduce Results: [25 points]
Reproduce the results depicted in the paper only for CIFAR-10 dataset using the authors’ code.
Out of the four pre-trained CNN models considered in the paper for CIFAR-10 experiments, it will
okay if you consider any two dissimilar approaches.
Are you able to get the same results as reported by the authors? Comment on the challenges
faced. List all the hyper-parameters values that authors didn’t mention and you need to make your
own choice for the same. Any new insights from the paper will be subjected to additional bonus marks.
Q.2. Model Multiple Humans: [35 points]
Consider an approach where more than one human decision makers (say 3) are available. Each
human provides one label for the given input image. The final human predicted label can then be
chosen based on majority voting rule. If none of the human provides the same label, then a random
label from the three labels are chosen. To understand this, consider the following example:
1. If human 1 suggest that the output is cat, human 2 suggest that the output is deer, and human
3 suggests that the output is bird, then one of the output from cat, deer and bird will be selected
uniformly at random.
2. If human 1 suggests cat, human 2 suggests bird, and human 3 suggests cat, then the human label
will be considered as cat.
Mention your approach for modelling more than one human on the CIFAR-10 dataset. Now,
regenerate all the results generated in the previous section using the multiple human labels.
Comment on your observations. How does the error rates perform with respect to different error
rates of human models? Also, mention the use-cases or application with such an approach where there
are more than one human decision makers in a human-AI team.
[Bonus Marks:] Provide a better mathematical model to incorporate multiple humans based on
the theory proposed in the paper. [Extra 20 points]
Q.3. Neural Network for Calibrated Probabilities: [35 points]
Now consider another case where the final “calibrated probabilities” are produced using a neural
network model. Let’s call this neural network model as Team model. Here, the output of the Team
model will have same units as the output of the machine learning model however the input to the
Team model will be the output of machine learning model and human predicted label (consider the
base case of only one human).
Train the Team model and regenerate the results as in the base-case (Q.1). Compare your results
with the base-case and list your observations. Do you find the Team model a better approach to
combine human and machine model outputs?
****** All the Best********
2