# Numerical Computing Project 4 – Spectral Graph Clustering

\$30.00

## Description

Spectral Graph Clustering
Clustering is a technique used for exploratory data analysis in an increasing number of fields, ranging from computer
science and biology to statistics and social sciences. The main motivation behind the use of clustering is the need
to extract information from a set of data, by grouping elements that are similar one to another according to a metric
of choice. This technique aims, in fact, to put together elements which display similar behaviours (e.g., costumers
purchasing the same set of goods if we think about an application in economics), while separating them from the
others.
In this project, you will be introduced to the family of spectral clustering algorithms, which present significant advantages over the traditional clustering algorithms (e.g., k-means or single linkage). The implementation of spectral
clustering is very straightforward and the results obtained usually outperform those obtained with the traditional techniques. For further details see e.g., [2]). The purpose of this project is to give you a gentle introduction to spectral
clustering through the implementation of a spectral graph clustering algorithm in Julia. In order to validate your
results, you will then have to test it against a variety of 2D graphs.
1. Spectral clustering of non-convex sets [50 points]
Activate the Julia environment located in the folder Sources. In the directory Datasets you will find a variety of
pointclouds and 2D graphs, that will serve as test cases for this assignment. Your first task consists in modifying the
Julia script cluster points.jl, by completing the following tasks:
1. Run the Julia function located in Tools/get points.jl. A graphical output of 6 different non-convex sets
will be produced. Use this function to output the coordinate list of the different cloudpoints and replace the
variable pts dummy with the set you wish to cluster. Consider, for example, the set named two spirals: can you
identify the two obvious clusters in this dataset? Describe them briefly and explain what difficulties a clustering
algorithm could eventually encounter in a scenario of this kind.
2. Use the function minspantree() located in the file Tools/min span tree.jl to find the minimal
spanning tree of the full graph. Use this information to determine the  factor for the -similarity graph.
3. Complete the Julia file epsilon sim graph.jl. It should generate the similarity matrix of the -similarity
graph.
4. Create the adjacency matrix for the  similarity graph and visualize the resulting graph using the function
draw graph().
5. Create the Laplacian matrix and implement spectral clustering. Your goal is to find the eigenvectors of the
Laplacian corresponding to the K = 2 smallest eigenvalues. Afterwards, use the function kmeans() to
cluster the rows of these eigenvectors.
6. Use the kmeans() function to perform k-means clustering on the input points. Visualize the two clustering
results using the function draw graph(). You can debug your implementation by using as a reference the
results reported in Figure 1, which are relative to the two spirals dataset.
1
Numerical Computing, Fall Semester 2022
Lecturer: Prof. O. Schenk
Assistants: E. Vecchi, L. Gaedke-Merzhauser ¨
Figure 1: Clustering results for the spirals dataset.
7. Cluster the datasets i) Two spirals, ii) Cluster in cluster, and iii) Crescent & full moon in K = 2 clusters, and
visualize the results obtained with spectral clustering and k-means directly on the input data. Do the same for
i) Corners, and ii) Outlier for K = 4 clusters.
HINT: The parameters σ, for the Gaussian similarity function
sij = e
−kxi−xj k
2
2σ2
, (1)
and , that controls the size of the neighborhood in the  similarity graph have to be chosen according to the position
of the points of each dataset in the 2D space, and the distances between them. Include the values that you selected in
2. Spectral clustering of real-world graphs [35 points]
We will now cluster 2D graphs emerging from structural engineering matrices of NASA, available in the SuiteSparse
points to the script cluster graphs.jl:
1. Construct the Laplacian matrix, and draw the graphs using the eigenvectors to supply coordinates. Locate vertex
i at position:
(xi
, yi) = (v2(i), v3(i)),
where v2, v3 are the eigenvectors associated with the 2nd and 3rd smallest eigenvalues. Figure 2 illustrates
these 2 ways of visualizing the airfoil1 graph.
2. Cluster each graph in K = 4 clusters with the spectral and the k-means method. Plot all of your results and
describe the differences that you can see in the output of the 2 different algorithms and where they might come
from. The clusters obtained by the 2 techniques in the case of the airfoil1 graph are presented in Figure 3.
3. Report in Table 1 the number of nodes per cluster and plot a histogram of the reported values in a way similar
to Figure 4. What can you observe?
2
Numerical Computing, Fall Semester 2022
Lecturer: Prof. O. Schenk
Assistants: E. Vecchi, L. Gaedke-Merzhauser ¨
Figure 2: Visualization of the airfoil1 graph.
Figure 3: The airfoil1 clusters (left: spectal clusters, right: k-means clusters).
3
Numerical Computing, Fall Semester 2022
Lecturer: Prof. O. Schenk
Assistants: E. Vecchi, L. Gaedke-Merzhauser ¨
Table 1: Clustering results, K = 4
Case Spectral K-Means
grid2
barth
3elt
Figure 4: Histograms representing the number of nodes per cluster for the airfoil1 graph.
Quality of the code and of the report [15 Points]
The highest possible score of each project is 85 points and up to 15 additional points can be awarded based on the
quality of your report and code (maximum possible grade: 100 points). Your report should be a coherent document,
structured according to the template provided on iCorsi. If there are theoretical questions, provide a complete and
detailed answer. All figures must have a caption and must be of sufficient quality (include them either as .eps or .pdf).
If you made a particular choice in your implementation that might be out of the ordinary, clearly explain it in the
report. The code you submit must be self-contained and executable, and must include the set-up for all the results that
you obtained and listed in your report. It has to be readable and, if particularly complicated, well-commented.
Summarize your results and experiments for all exercises by writing an extended LaTeX report, by using the template provided on iCorsi (https://www.icorsi.ch/course/view.php?id=14666). Submit your gzipped
archive file (tar, zip, etc.) – named project number lastname firstname.zip/tgz – on iCorsi (see [NC
2022] Project 1 – Submission PageRank Algorithm) before the deadline. Submission by email or through any other
channel will not be considered. Late submissions will not be graded and will result in a score of 0 points for that
project. You are allowed to discuss all questions with anyone you like, but: (i) your submission must list anyone you
discussed problems with and (ii) you must write up your submission independently. Please remember that plagiarism
will result in a harsh penalization for all involved parties.
4
Numerical Computing, Fall Semester 2022
Lecturer: Prof. O. Schenk
Assistants: E. Vecchi, L. Gaedke-Merzhauser ¨
In-class assistance
If you experience difficulties in solving the problems above, please join us in class either on Tuesdays 16:30-18:15 in
room C1.03 or on Wednesdays 14:30-16:15 in room D1.15.
References
[1] Timothy A. Davis and Yifan Hu. The university of florida sparse matrix collection. ACM Trans. Math. Softw.,
38(1):1:1–1:25, December 2011.
[2] Ulrike Luxburg. A Tutorial on Spectral Clustering. Statistics and Computing, 17(4):395–416, December 2007.
5