Description
(50pts) Semi-Supervised Learning: Self Training
This homework requires that you implement the self-training algorithm discussed in the semisupervised learning lectures. You should refer back to your notes for the self-training pseudo-code.
(25pts) An Experiment on Synthetic Data
Generate 2D Gaussian data that is similar to the data set shown in Figure 1. Using data sets that
have 1000 samples (500 from each class) implement the self-training algorithm, and test on a data
set of 1000 samples. The requirements for this problem are as follows:
• Your implementation of self-training must use a classifier that can give probabilities to select the data points that will have pseudo labels assigned to them. I will not make a restriction on the classifier other than the probabilities requirement. You will need to choose a
suitable threshold to determine the data samples that will be labeled for the next round of
self-training.
• Report the error of the self-training algorithm on the testing data at: (1) the first time a classifier is trained using only the labeled; (2) at least one time point during the self training
process (i.e., when pseudo labels are used); and (3) after self-training is completed. Comment on the results.
• Perform an experiment reporting the above requirements with 10% and when 25% of the
training data are labeled.
arizona.edu 2 March 29, 2017
Spring 2017 Dept. of ECE University of Arizona
(25pts) An Experiment on Real World Data
Implement the self-training algorithm using 10 data sets available on the course Github repo. The
requirements for this problem are as follows:
• You must report your results using 5-fold cross validation. In each cross validation step you
can only use 15% of data as labeled. For example, if I have 500 data samples for training
I must randomly select 15 data samples that I can use with my supervised classifier and the
the 85 samples can be used in self-training.
• Write a brief discussion on wether semi-supervised helped on real-world data sets.
arizona.edu 3 March 29, 2017