Description
1. Support Vector Machines with Synthetic Data
a. The effect of the regularization parameter C
Plot:
Discussion:
Training Error: Training Error monotonically decreases while C increases. It decreases quickly at
first and slowly in the end.
Validation Error: Validation Error decreases when C <= 1 and generally increases afterwards
(overfitting).
C is a trade-off between training error and flatness. Some people called C as Cross-validation
parameter. While C increases, the model is softer, which means more slack are allowed and
whole model is more precise and less misclassifying. Vice versa.
Final Model Selection:
To have the least Validation Error, Cbest = 1.
Console:
b. The effect of RBF kernel parameter �
Plot:
Discussion:
Training Error: Training Error monotonically decreases while � increases. It decreases quickly at
first and slowly in the end.
Validation Error: Validation Error decreases when � <= 1 and generally increases afterwards
(overfitting).
� defines how far the influence of a single training example reaches. For a big �, it will generate
a sharp heap which will locate most of its contribution near the center. Hence, less constrain
will cause the model loss the sense of the overall shape of data. When � is large enough, the
model’s accuracy is close to 1 but useless for classification.
Final Model Selection:
To have the least Validation Error, �best = 1.
2. Breast Cancer Diagnosis with Support Vector Machines
Print Errors:
I use median to blur the matrix and find the final “best” (C, �) pair.
Final Model Selection:
To have the least Validation Error, Cbest = 1000 and �best = 0.01.
3. Breast Cancer Diagnosis with k-Nearest Neighbors
Plot:
Final Model Selection:
To have the least Validation Error, kbest = 5.
Discussion:
Depending on the result which I got, I will prefer to use kNN. For the test accuracy of kNN in kbest
is larger than that of SVM in (Cbest, �best). But all these two is good for Breast Cancer Diagnosis:
Test Accuracy(SVMs(Cbest=1000, �best=0.01)) = 0.9478 < 0.9565 = Test Accuracy(kNN(kbest=5))