Description

5/5 - (3 votes)

1 Nearest Neighbors (30 points)
File name: nearest neighbors.py
Implement a function in python:
KNN_test(X_train,Y_train,X_test,Y_test,K)
that takes training data, test data, and K as inputs. KNN test(X train,Y train,X test,Y test,K) should
return the accuracy on the test data. The training data and test data should have the same format as
described earlier for the decision tree problems. Your function should be able to handle any dimension
feature vector, with real-valued features. Remember your labels are binary, and they should be −1 for the
negative class and 1 for the positive class. Two csv files are provided with test data.
Write-Up: Using the following training data provided in nearest neighbors 1.csv, how would your algorithm
classify the test points listed below with K=1, K=3, and K=5?
test1 = (1, 1, 1)
test2 = (2, 1, −1)
test3 = (0, 10, 1)
test4 = (10, 10, −1)
test5 = (5, 5, 1)
test6 = (3, 10, −1)
test7 = (9, 4, 1)
test8 = (6, 2, −1)
test9 = (2, 2, 1)
test10 = (8, 7, −1)
622 Implement the following function in python:
choose_K(X_train,Y_train,X_val,Y_val)
that takes training data and validation data as inputs and returns a K value. This function must iterate
through all possible K values and choose the best K for the given training data and validation data. K must
be chosen in order to achieve the maximum accuracy on the validation data. Write-Up: What is the best
K value for the training data above?
1
2 Clustering (30 points)
File name: clustering.py
Implement a function in python:
K_Means(X,K,mu)
that takes feature vectors X and a K value as input and returns a numpy array of cluster centers C. Your
function should be able to handle any dimension of feature vectors and any K > 0. mu is an array of
initial cluster centers, with either K or 0 rows. If mu is empty, then you must intialize the cluster centers
randomly. Otherwise, start with the given cluster centers. In the test script an example cluster center for
mu is provided, but for grading mu could have a different number of cluster centers.
Write-Up: Test your function on the following training data provided in clustering 2.csv, with K=2 and
K=3. What changes do you notice when updating the k value?
Sample x1 x2
1 1 0
2 7 4
3 9 6
4 2 1
5 4 8
6 0 3
7 13 5
8 6 8
9 7 3
10 3 6
11 2 1
12 8 3
13 10 2
14 3 5
15 5 1
16 1 9
17 10 3
18 4 1
19 6 6
20 2 2
622 Implement the following function in python:
K_Means_better(X,K)
that takes feature vectors X and a K value as input and returns a numpy array of cluster centers C. Your
function should be able to handle any dimension of feature vectors and any K > 0. Your function will
run the above-implemented K Means(X,K) function many times until the same set of cluster centers are
returned a majority of the time. At this point, you will know that those cluster centers are likely the best
ones. K Means better(X,K) will return those cluster centers.
Write-Up: Test your function with K=2 and K=3 on the above data. Plot your clusters in different colors
and label the cluster centers.
3 Perceptron (30 points)
File name: perceptron.py
Implement a function in python:
perceptron_train(X,Y)
2
that takes training data as input and outputs the weights w, and the bias b of the perceptron. Your function
should handle any real-valued features, with feature vectors in any dimension, and binary labels.
Implement a function in python:
perceptron_test(X_test, Y_test, w, b)
that takes testing data, the perceptron weights and bias as input and returns the accuracy on the testing
data. I will use a script similar to the provided test script. Note that I will test many different scenarios.
Write-Up: Train your perceptron on the following dataset provided in perceptron 2.csv. Using the w and
b you get, plot the decision boundary.
Sample x1 x2 y
1 -2 1 1
2 1 1 1
3 1.5 -0.5 1
4 -2 -1 -1
5 -1 -1.5 -1
6 2 -2 -1
4 Report (10 points)
622 is required to submit their report using LaTeX as a PDF and the source file. 422 can submit a
README.txt or a LaTeX for extra credit.
3

CS 422/622 Project 2 Nearest Neighbors

Description

Related products

CS 422/622 Project 4 SOLVED

CS 422/622 Project 3 Training Adaboost

CS 491/691 Project 5 Binary Classification