Description
Part I
Basic Concepts in Linear Algebra and Calculus
1. We have two vectors, π₯π₯1 and π₯π₯2
π₯π₯1 = οΏ½
1
2
οΏ½ and π₯π₯2 = οΏ½
10
18οΏ½
What is the distance between π₯π₯1 and π₯π₯2 ?
(1) if the distance measure is based on L2 norm (a.k.a Euclidean norm)
(2) if the distance measure is based on L1 norm
(3) if the distance measure is based on Lβ norm (a.k.a infinity norm)
Assuming there are two feature components π₯π₯ = οΏ½
ππππππππππππ
π π π π π π π π π π οΏ½ in an application, does the Lβ norm-based
distance measure make sense for the application of customer segmentation?
2. We define a scalar valued function of a vector variable
ππ(π₯π₯) = π₯π₯πππ΄π΄π΄π΄
Here, π₯π₯ is a column vector, π₯π₯ππ is the transpose of π₯π₯, and π΄π΄ is a symmetric matrix
To simplify this question, let’s assume π₯π₯ has only two elements π₯π₯ = οΏ½
πΌπΌ
π½π½οΏ½, and π΄π΄ = οΏ½
ππ ππ
ππ πποΏ½
The derivative of ππ with respect to π₯π₯ is a vector defined by ππππ
ππππ = οΏ½
ππππ
ππππ
ππππ
πππποΏ½
Show that ππππ
ππππ = 2π΄π΄π₯π₯
Hint: calculate ππ(π₯π₯), 2π΄π΄π΄π΄,
ππππ
ππππ and ππππ
ππππ
K-means clustering
3. Briefly describe the two key steps in one iteration of the k-means algorithm.
4. What is the distance measure used in k-means (implemented in sk-learn)?
5. The k-means algorithm can converge in a finite number of iterations. Why?
6. The clustering result of k-means could be random. Why?
7. The minimum value of the objective/loss function is zero for any dataset. What is the clustering result
when the objective function is zero?
Note: for questions 3,4,5,6,7, you only need to write a few words (bullet points) for each one.
You may write the answers on a piece of paper, take a photo using your cell phone, and upload the picture
to Blackboard. Make sure that your handwriting is human-readable.
You may use MS-word to write the answers, convert the file to PDF, and upload it to Blackboard.
Part 2: Programming
Complete the tasks in the files:
H1P2T1_kmeans.ipynb
If you want to get some bonus points, try this task:
H1P2T2_kmeans_compression.ipynb
Grading: the number of points
Undergraduate Student Graduate Student
Basic Concepts in Linear Algebra
and Calculus
10 10
K-means clustering 10 10
H1P2T1 30 30
H1P2T2 10 (bonus) 10 (bonus)
Total number of points 50 + 10 50+10
Upload your files (*_your_name.ipynb) to blackboard
Do NOT covert the ipynb files to pdf.