# CS422 Homework 4

\$30.00

## Description

1 Recitation Exercises
These exercises are to be found in: Introduction to Data Mining, 2nd
Edition by Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar.
1.1 Chapter 7
Exercises: 4,7,11,16,17,21,22
2 Practicum Problems
These problems will primarily reference the lecture materials and the examples
given in class using Python. It is suggested that a Jupyter/IPython notebook
be used for the programmatic components.
2.1 Problem 1
Load the auto-mpg sample dataset from the UCI Machine Learning Repository
(auto-mpg.data) into Python using a Pandas dataframe. Using only the continuous fields as features, impute any missing values with the mean, and perform
a Hierarchical Clustering (Use sklearn.cluster.AgglomerativeClustering)
with linkage set to average and the default affinity set to a euclidean. Set
the remaining parameters to obtain a shallow tree with 3 clusters as the target.
Obtain the mean and variance values for each cluster, and compare these values
to the values obtained for each class if we used origin as a class label. Is there
a clear relationship between cluster assignment and class label?
2.2 Problem 2
Load the Boston dataset (sklearn.datasets.load boston()) into Python using a Pandas dataframe. Perform a K-Means analysis on scaled data, with the
number of clusters ranging from 2 to 6. Provide the Silhouette score to justify
which value of k is optimal. Calculate the mean values for all features in each
cluster for the optimal clustering – how do these values differ from the centroid
coordinates?
2.3 Problem 3
a Pandas dataframe. Perform a K-Means analysis on scaled data, with the
Prof. Panchal:
Wed. 6:45PM-9:35PM
CS 422 – Data Mining Spring 2021:
All Sections
Assigned:
Apri 04, 2021 Homework 4
Due:
April 18, 2021
number of clusters set to 3. Given the actual class labels, calculate the Homogeneity/Completeness for the optimal k – what information do each of these
metrics provide?
Prof. Panchal:
Wed. 6:45PM-9:35PM
CS 422 – Data Mining Spring 2021:
All Sections