Description

5/5 - (3 votes)

Part I
Basic Concepts in Linear Algebra and Calculus
1. We have two vectors, 𝑥𝑥1 and 𝑥𝑥2
𝑥𝑥1 = �
1
2
� and 𝑥𝑥2 = �
10
18�
What is the distance between 𝑥𝑥1 and 𝑥𝑥2 ?
(1) if the distance measure is based on L2 norm (a.k.a Euclidean norm)
(2) if the distance measure is based on L1 norm
(3) if the distance measure is based on L∞ norm (a.k.a infinity norm)
Assuming there are two feature components 𝑥𝑥 = �
𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 � in an application, does the L∞ norm-based
distance measure make sense for the application of customer segmentation?
2. We define a scalar valued function of a vector variable
𝑓𝑓(𝑥𝑥) = 𝑥𝑥𝑇𝑇𝐴𝐴𝐴𝐴
Here, 𝑥𝑥 is a column vector, 𝑥𝑥𝑇𝑇 is the transpose of 𝑥𝑥, and 𝐴𝐴 is a symmetric matrix
To simplify this question, let’s assume 𝑥𝑥 has only two elements 𝑥𝑥 = �
𝛼𝛼
𝛽𝛽�, and 𝐴𝐴 = �
𝑎𝑎 𝑐𝑐
𝑐𝑐 𝑏𝑏�
The derivative of 𝑓𝑓 with respect to 𝑥𝑥 is a vector defined by 𝑑𝑑𝑑𝑑
𝑑𝑑𝑑𝑑 = �
𝑑𝑑𝑑𝑑
𝑑𝑑𝑑𝑑
𝑑𝑑𝑑𝑑
𝑑𝑑𝑑𝑑�
Show that 𝑑𝑑𝑑𝑑
𝑑𝑑𝑑𝑑 = 2𝐴𝐴𝑥𝑥
Hint: calculate 𝑓𝑓(𝑥𝑥), 2𝐴𝐴𝐴𝐴,
𝑑𝑑𝑑𝑑
𝑑𝑑𝑑𝑑 and 𝑑𝑑𝑑𝑑
𝑑𝑑𝑑𝑑
K-means clustering
3. Briefly describe the two key steps in one iteration of the k-means algorithm.
4. What is the distance measure used in k-means (implemented in sk-learn)?
5. The k-means algorithm can converge in a finite number of iterations. Why?
6. The clustering result of k-means could be random. Why?
7. The minimum value of the objective/loss function is zero for any dataset. What is the clustering result
when the objective function is zero?
Note: for questions 3,4,5,6,7, you only need to write a few words (bullet points) for each one.
You may write the answers on a piece of paper, take a photo using your cell phone, and upload the picture
to Blackboard. Make sure that your handwriting is human-readable.
You may use MS-word to write the answers, convert the file to PDF, and upload it to Blackboard.
Part 2: Programming
Complete the tasks in the files:
H1P2T1_kmeans.ipynb
If you want to get some bonus points, try this task:
H1P2T2_kmeans_compression.ipynb
Grading: the number of points
Undergraduate Student Graduate Student
Basic Concepts in Linear Algebra
and Calculus
10 10
K-means clustering 10 10
H1P2T1 30 30
H1P2T2 10 (bonus) 10 (bonus)
Total number of points 50 + 10 50+10
Upload your files (*_your_name.ipynb) to blackboard
Do NOT covert the ipynb files to pdf.

CSC546 Homework 1

Description

Related products

Database Systems, CSCI 4380-01 Homework 1

CSC546 Homework 2

ECSE 4540: Introduction to Image Processing, Homework 1