EE 660 Homework 7 (Week 14)

$30.00

Category: Tags: , , , You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (3 votes)

1. In this problem you will implement transfer learning (TL) based on importance
weighting, and compare it with supervised learning (SL). You will work with data
from the TL_data folder. There are 5 files. They contain source training and test data,
labeled and unlabeled target training data, and target test data. The target data had a
covariate shift with respect to the source data. In all items below, you should use the
AdaBoostClassifier from sklearn with default parameters.
a) Let’s start by estimating the classifier’s performance on this data with a regular SL
problem. Train a classifier on the source training data. Report its accuracy on the
source test data.
For parts (b)-(d) below, you will use standard SL techniques but applied to the TL
problem (3 different approaches).
b) Use the classifier trained in item (a) to predict labels of the target test data. Report
the accuracy.
c) Train a classifier only on the labeled target training data. Report its accuracy on
the target test data.
d) Train a classifier on the union of source training and labeled target training data.
Report its accuracy on the target test data.
e) Compare the results of (a)-(d). Explain any differences (and any lack of
differences) in accuracy.
For parts (f)-(g), you will use TL techniques on the TL problem.
f) Let’s assume source and target domain features follow multivariate normal
distributions with different parameters. Estimate the mean and covariance matrix
of each domain. Hint: you can use sklearn’s GaussianMixture class. You can
proceed in two ways.
i. Estimate the two means and two covariance matrices simultaneously. This
can be done by letting the Gaussian mixture model estimator know there
are 2 components in the mixture and providing the entire training data
(source + labeled target + unlabeled target) to it.
ii. Estimate each mean and covariance matrix individually. This can be done
by training two separate Gaussian mixture model estimators, each with one
component density. One estimator will receive only the source training
data and the other estimator will receive all the target training data.
Which method is likely to yield better results, i.e., means and covariance matrices
closer to the true value? Justify your answer.
Provide the values for the mean and covariance matrix of each domain.
p. 2 of 5
g) Now that you have the parameters of each domain, you can compute the weight of
each data samples as �! = �”(�!)/�#(�!). Train a classifier on the union of source
training and labeled target data using these weights. Report the accuracy on the
target test data.
h) Compare results of (g) with results of (b)-(d). Explain any differences and any
lack of difference.
2. EM for semi-supervised learning. Consider a 2-class semi-supervised learning
problem in which there are labeled samples and unlabeled sample. (For
example, think of being given l labeled samples, and then acquiring unlabeled
samples one at a time.) There is 1 feature, and each class is modeled as a Gaussian:
In the parts below, you will use EM to estimate the means and . You may
assume the priors and variances are given constants. Generally the subscripts h and i
will indicate unlabeled and labeled samples, respectively.
In this problem, parts (a)-(d) are to be done by hand. Part (e) can be done by hand or
computer; part (f) is best done by computer.
a) Consider the iteration of EM. Derive the E step in terms of given quantities:
that is, starting from
show that:
in which . Also, find .
In parts (b)-(d), you will derive the M step formulas, also for the iteration of EM.
b) First, show that
l u = 1
p( x y = c, θ ) = N x µc ,σ c
2 ( ), c = 1,2
µ1 µ2
t
th
p H D, θ ⎛ (t) ⎝
⎜ ⎞

⎟ = p yh = ch xh, θ ⎛ (t) ⎝
⎜ ⎞

⎟ = γ hch
(t) , ch = 1,2
γ hch
(t) = π ch
α h
(t) 2πσ ch
2
exp −
xh − µch
⎛ (t) ⎝
⎜ ⎞


2
2σ ch
2














π ch
! p yh = ch θ ⎛ (t) ⎝
⎜ ⎞

⎟ = p yh = ch ( ) α h
(t)
t
th
p. 3 of 5
in which and similarly for .
c) Take from your result of (b), plug in for the normal densities, and
drop any additive terms that are constants of . Then plug in to the M equation:
and simplify to get:
in which a constant multiplicative factor of has been dropped. (Hint: you
may find it useful to use .)
d) Re-write your result of part (c) to express it in terms of . (Hint:
you might find it useful to use the indicator function.) Then solve for
. (Hint: find the argmax by taking and setting
equal to 0; similarly for .) Let number of labeled samples with label
, and umber of labeled samples with label . (Note that is
constant of and because it used the (constant) estimates and
from the E step.)
e) Given: ; data as follows:
labeled data ; unlabeled sample .
p(D,H θ ) = p xh yh = ch ( , θ )π ch
p xi yi = ci ( , θ )
i=1
l
∏ π ci
π ci
= p yi = ci ( θ ) = p yi = ci ( ) π ch
ln p(D,H θ )
θ
θ(t+1) = argmax
θ
E
H D,θ(t) {ln p(D,H θ )}
= argmax
θ
γ hch
(t) ln p(D,H θ ) ch=1
2

θ(t+1) = argmax
θ
γ hch
(t) −
xh − µch ( )
2
σ ch
2








⎥ c ⎥ h=1
2
∑ + −
xi − µci ( )
2
σ ci
2









i=1 ⎥
l



⎪⎪





⎪⎪



1

γ h1
(t) +γ h2
(t) = 1
µ1, µ2 σ1
2,σ 2
2
θ(t+1) = µ1
(t+1), µ2
⎡ (t+1)

⎢ ⎤


T ∂
∂µ1
µ2 l
1 =
ci = 1 l
2 = ci = 2 γ hch
(t)
µ1 µ2 µ1
(t) µ2
(t)
π1 = π 2 = 0.5, σ1
2 = σ 2
2 = 1
xi
, yi {( )}i=1
l
= {(1,1),(2,1),(4,2)} xh = 3
p. 4 of 5
Suppose the values for at the beginning of the iteration of EM are:
, .
(i) Calculate the responsibilities and from the E step (using part (a));
(ii) Calculate the new mean estimates and from the M step (using
part (d) result).
Tip: While not required for part (e), you may find it useful to do the calculations
by computer, so that your code can be used for part (f) also.
f) Run more iterations (by computer), until and converge (until they
change only a small amount from one iteration to the next – choose a suitable
threshold). Plot and vs. t, as well as and vs. t. (You are
not required to compute in this problem.) Give your final values for
, , , and .
3. In this problem you will explore semi-supervised learning using S3VM, and compare
to supervised learning. Throughout this problem, use the qns3vm code available at
the course’s page under Week 12 with parameters kernel_type=’Linear’ and
‘lam=1.0’ (c.f. Discussion 12 for more information).
Note: if you get a “PendingDeprecationWarning” that halts the execution of the
code, add the line:
“warnings.filterwarnings(‘ignore’, category=PendingDeprecationWarning)”
at the start of your code. The SVM parameters should also be set to kernel=’linear’
and C=1.0.
Use the data inside the SSL_data folder. Load the data files named ssl_train_data and
test_data. On each of them, the first 10 columns are the features, i.e., �$%&!’ and
�$()$, and the last column represents the true label, i.e., �$%&!’ and �$()$. There is a
total of 200 training samples and the classes are �! = {0,1}. Note that the qns3vm
code expects classes {−1, 1} and adjust accordingly.
(a) To get an estimate of the best-case scenario, let’s start with a dataset that is
entirely labeled. Train an SVM classifier on the entire train data and compute its
accuracy on the test data.
(b) Now let’s assume a scenario where only a few samples of the training data are
labeled. Select only the first 2� samples of the training set (you can note that the
θ t
th
µ1
(t) = 1.5 µ2
(t) = 4.0
γ h1
(t) γ h2
(t)
µ1
(t+1) µ2
(t+1)
µ1
(t+1) µ2
(t+1)
µ1
(t+1) µ2
(t+1) γ h1
(t) γ h2
(t)
p D θ(t ) ( )
µ1
(t+1) µ2
(t+1) γ h1
(t) γ h2
(t)
p. 5 of 5
dataset was built in a way that the first 2� samples always contain � samples of
each class), train an SVM classifier only on those first 2� samples, and report the
accuracy on the test data for � = [1, … ,10]. Note that the test set does not change
size.
(c) Next, let’s repeat the scenario from (b), but let’s make use of the unlabeled data.
Train an S3VM model on the entire data (2� labeled samples and 200 − 2�
unlabeled samples), and the report the accuracy on the test data, for � =
[1, … ,10].
(d) Plot your results of (b) and (c) on a single plot, showing accuracy (percent correct
classification on test set) vs. �* = 2�.
(e) Interpret your result of (d).
a. In what ways, if any, are they what you expected? Explain why you
expected them to be so.
b. In what ways, if any, are they different than what you expected? Explain
what you expected that is different, and hypothesize why the difference.