CS4342 Assignment #6

$35.00

Category: Tags: , , , You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (4 votes)

1. This problem involves hyperplanes in two dimensions.
(a) Sketch the hyperplane 1 + 3 𝑋1 βˆ’ 𝑋2 = 0. Indicate the set of points for which 1 + 3 𝑋1 βˆ’ 𝑋2 > 0, as
well as the set of points for which 1 + 3 𝑋1 βˆ’ 𝑋2 < 0.
(b) On the same plot, sketch the hyperplane βˆ’2 + 𝑋1 + 2 𝑋2 = 0. Indicate the set of points for which
βˆ’2 + 𝑋1 + 2 𝑋2 > 0, as well as the set of points for which βˆ’2 + 𝑋1 + 2 𝑋2 < 0.
Points: 5
2. We have seen that in p = 2 dimensions, a linear decision boundary takes the form 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 =
0.We now investigate a non-linear decision boundary.
(a) Sketch the curve
(1 + 𝑋1
)
2 + (2 βˆ’ 𝑋2
)
2 = 4.
(b) On your sketch, indicate the set of points for which (1 + 𝑋1
)
2 + (2 βˆ’ 𝑋2
)
2 > 4,
as well as the set of points for which (1 + 𝑋1
)
2 + (2 βˆ’ 𝑋2
)
2 ≀ 4.
(c) Suppose that a classifier assigns an observation to the blue class if
(1 + 𝑋1
)
2 + (2 βˆ’ 𝑋2
)
2 > 4,
and to the red class otherwise. To what class is the observation (0, 0) classified? (βˆ’1, 1)? (2, 2)? (3, 8)?
(d) Argue that while the decision boundary in (c) is not linear in terms of 𝑋1 and 𝑋2, it is linear in terms of
𝑋1, 𝑋2, 𝑋1
2
, and 𝑋2
2
.
Points:5
3. Here we explore the maximal margin classifier on a toy data set.
(a) We are given n = 7 observations in p = 2 dimensions. For each observation, there is an associated class
label.
(b) Sketch the optimal separating hyperplane, and provide the equation for this hyperplane (of the form
(9.1)).
(c) Describe the classification rule for the maximal margin classifier.
It should be something along the lines of β€œClassify to Red if 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 > 0, and classify to Blue
otherwise.” Provide the values for 𝛽0, 𝛽1, and 𝛽2.
(d) On your sketch, indicate the margin for the maximal margin hyperplane.
(e) Indicate the support vectors for the maximal margin classifier.
(f) Argue that a slight movement of the seventh observation would not affect the maximal margin
hyperplane.
(g) Sketch a hyperplane that is not the optimal separating hyperplane, and provide the equation for this
hyperplane.
(h) Draw an additional observation on the plot so that the two classes are no longer separable by a
hyperplane.
Points: 5
4. Suppose that we have four observations, for which we compute a dissimilarity matrix, given by
For instance, the dissimilarity between the first and second observations is 0.3, and the dissimilarity
between the second and fourth observations is 0.8.
(a) On the basis of this dissimilarity matrix, sketch the dendrogram that results from hierarchically clustering
these four observations using complete linkage. Be sure to indicate on the plot the height at which each
fusion occurs, as well as the observations corresponding to each leaf in the dendrogram.
(b) Repeat (a), this time using single linkage clustering.
(c) Suppose that we cut the dendogram obtained in (a) such that two clusters result. Which observations
are in each cluster?
(d) Suppose that we cut the dendogram obtained in (b) such that two clusters result. Which observations
are in each cluster?
(e) It is mentioned in the chapter that at each fusion in the dendrogram, the position of the two clusters
being fused can be swapped without changing the meaning of the dendrogram. Draw a dendrogram that is
equivalent to the dendrogram in (a), for which two or more of the leaves are repositioned, but for which the
meaning of the dendrogram is the same.
Points: 5
5. In this problem, you will perform K-means clustering manually, with K = 2, on a small example with n = 6
observations and p = 2 features. The observations are as follows.
(a) Plot the observations.
(b) Randomly assign a cluster label to each observation. Report the cluster labels for each observation.
(c) Compute the centroid for each cluster.
(d) Assign each observation to the centroid to which it is closest, in terms of Euclidean distance. Report the
cluster labels for each observation.
(e) Repeat (c) and (d) until the answers obtained stop changing.
(f) In your plot from (a), color the observations according to the cluster labels obtained.
Points: 5
6. Suppose that for a particular data set, we perform hierarchical clustering using single linkage and using
complete linkage. We obtain two dendrograms.
(a) At a certain point on the single linkage dendrogram, the clusters {1, 2, 3} and {4, 5} fuse. On the complete
linkage dendrogram, the clusters {1, 2, 3} and {4, 5} also fuse at a certain point. Which fusion will occur
higher on the tree, or will they fuse at the same height, or is there not enough information to tell?
(b) At a certain point on the single linkage dendrogram, the clusters {5} and {6} fuse. On the complete
linkage dendrogram, the clusters {5} and {6} also fuse at a certain point. Which fusion will occur higher on
the tree, or will they fuse at the same height, or is there not enough information to tell?
Points: 5
7. In words, describe the results that you would expect if you performed K-means clustering of the eight
shoppers in Figure 10.14 – shown below, on the basis of their sock and computer purchases, with K = 2.
Give three answers, one for each of the variable scalings displayed. Explain.
Points: 5
Applied Questions
1. We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a
non-linear decision boundary.We will now see that we can also obtain a non-linear decision boundary by
performing logistic regression using non-linear transformations of the features.
(a) Generate a data set with n = 500 and p = 2, such that the observations belong to two classes with a
quadratic decision boundary between them. For instance, you can do this as follows:
> 𝑋1=random.uniform (500) -0.5
https://numpy.org/doc/stable/reference/random/generated/numpy.random.uniform.html
> 𝑋2= random.uniform (500) -0.5
> 𝑦 = 1 βˆ— ( 𝑋1
2 βˆ’ 𝑋2
2 > 0)
(b) Plot the observations, colored according to their class labels. Your plot should display 𝑋1 on the x-axis,
and 𝑋2 on the y-axis.
(c) Fit a logistic regression model to the data, using 𝑋1 and 𝑋2 as predictors.
(d) Apply this model to the training data in order to obtain a predicted class label for each training
observation. Plot the observations, colored according to the predicted class labels. The decision boundary
should be linear.
(e) Now fit a logistic regression model to the data using non-linear functions of 𝑋1 and 𝑋2 as predictors (e.g.
𝑋1
2
, 𝑋1 Γ— 𝑋2, log(𝑋2), and so forth).
(f) Apply this model to the training data in order to obtain a predicted class label for each training
observation. Plot the observations, colored according to the predicted class labels. The decision boundary
should be obviously non-linear. If it is not, then repeat (a)-(e) until you come up with an example in which
the predicted class labels are obviously non-linear.
(g) Fit a support vector classifier to the data with 𝑋1 and 𝑋2 as predictors. Obtain a class prediction for each
training observation. Plot the observations, colored according to the predicted class labels.
(h) Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation.
Plot the observations, colored according to the predicted class labels.
(i) Comment on your results.
Hint:
– Check https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
Points: 15
2. In this problem, you will use support vector approaches in order to predict whether a given car gets high
or low gas mileage based on the Auto data set.
(a) Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars
with gas mileage below the median.
(b) Fit a support vector classifier to the data with the linear kernel, in order to predict whether a car gets
high or low gas mileage. Report the cross-validation error. Comment on your results.
(c) Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with different values of
gamma and degree. Comment on your results.
(d) Make some plots to back up your assertions in (b) and (c).
Hints:
– Check https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
– Check https://scikit-learn.org/0.18/auto_examples/svm/plot_iris.html
Points: 15
3. Consider the USArrests data. We will now perform hierarchical clustering on the states.
(a) Using hierarchical clustering with complete linkage and Euclidean distance, cluster the states.
(b) Cut the dendrogram at a height that results in three distinct clusters. Which states belong to which
clusters?
(c) Hierarchically cluster the states using complete linkage and Euclidean distance, after scaling the
variables to have standard deviation one.
(d) What effect does scaling the variables have on the hierarchical clustering obtained? In your opinion,
should the variables be scaled before the inter-observation dissimilarities are computed? Provide a
justification for your answer.
Hints:
– Check
https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.dendrogram.html
Points: 15
4. In this problem, you will generate simulated data, and then perform PCA and K-means clustering on the
data.
(a) Generate a simulated data set with 20 observations in each of three classes (i.e. 60 observations total),
and 50 variables. Use uniform or normal distributed samples.
(b) Perform PCA on the 60 observations and plot the first two principal component score vectors. Use a
different color to indicate the observations in each of the three classes. If the three classes appear
separated in this plot, then continue on to part (c). If not, then return to part (a) and modify the simulation
so that there is greater separation between the three classes. Do not continue to part (c) until the three
classes show at least some separation in the first two principal component score vectors. Hint: you can
assign different means to different classes to create separate clusters.
(c) Perform K-means clustering of the observations with K = 3. How well do the clusters that you obtained
in K-means clustering compare to the true class labels?
(d) Perform K-means clustering with K = 2. Describe your results.
(e) Now perform K-means clustering with K = 4, and describe your results.
(f) Now perform K-means clustering with K = 3 on the first two principal component score vectors, rather
than on the raw data. That is, perform K-means clustering on the 60 x 2 matrix of which the first column is
the first principal component score vector, and the second column is the second principal component score
vector. Comment on the results.
(g) Using the z-score function to scale your variables, perform K-means clustering with K = 3 on the data
after scaling each variable to have standard deviation one. How do these results compare to those obtained
in (b)? Explain.
Hints:
– Check https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
– Check https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
– Check https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.zscore.html
Points: 25