ECE523: Engineering Applications of Machine Learning and Data Analytics hw 1

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (3 votes)

Part A: Theory (20pts)
(3pts) Maximum Posterior vs Probability of Chance
Show/explain that P(ωmax|x) ≥
1
c when we are using the Bayes decision rule. Derive an expression for P(error). Let ωmax be the state of nature for which P(ωmax|x) ≥ P(ωi
|x) for i =, 1…,c.
Show that P(error) ≤ (c − 1)/c when we use the Bayes rule to make a decision. Hint, use the
results from the previous questions.
(3pts) Bayes Decision Rule Classifier
Let the elements of a vector x = [x1,…,xd ]
T be binary valued. Let P(ωj) be the prior probability
of the class ωj (j ∈ [c]), and let
pi j = P(xi = 1|ωj)
with all elements in x being independent. If P(ω1) = P(ω2) =
1
2
, and pi1 = p >
1
2
and pi2 = 1− p,
show that the minimum error decision rule is
Choose ω1 if X
d
i=1
xi >
d
2
Hint: Think back to ECE503 and types of random variables then start out with
Choose ω1 if P(ω1)P(x|ω1) > P(ω2)P(x|ω2)
(3pts) The Ditzler Household Growing Up
My parents have two kids now grown into adults. Obviously there is me, Greg. I was born on
Wednesday 13 November 1985. What is the probability that I have a brother? You can assume
that P(boy) = P(girl) =
1
2
.
(10pts) Linear Classifier with a Margin
Show that, regardless of the dimensionality of the feature vectors, a data set that has just two
data points, one from each class, is sufficient to determine the location of the maximum-margin
hyperplane. Hint #1: Consider a data set of two data points, x1 ∈ C1 (y1 = +1) and x2 ∈ C2 (y2 =
−1) and set up the minimization problem (for computing the hyperplane) with appropriate
constraints on w
Tx1 +b and w
Tx2 +b and solve it. Hint #2: This can be formed as a constrained
optimization problem.
arg min
w∈Rp
kwk
2
2
Subject to: (some constraint)
What is w? b? Hint: What are the constraints? How did we solve the constrained optimization problem in Fisher’s linear discriminate?
arizona.edu 2 January 13, 2017
Gregory Ditzler Dept. of ECE University of Arizona
Figure 1: Example of the half moon data set.
(1pt) Decision Making with Bayes
The Bayes decision rule describes the approach we take to choosing a class ω for a data point
x. This can be achieved modeling P(ω|x) or P(x|ω)P(ω)/P(x). Compare and contrast these two
approaches to modeling and discuss the advantages and disadvantages. For the latter model,
why might knowing P(x) be useful?
Part B: Practice (20pts)
You are free to use functions already implemented in Matlab, Python or R. I recommend using Python’s Scikit-learn (http://scikit-learn.org/stable/) as is implements most of the
methods we will be discussing in this course. . . as well as problems in this homework!
(15pts) Half Moon Data Generator and Linear Classifier
Write a script to generate the “half moon” data set shown in Figure 1. Implement a linear classifier (e.g., logistic regression or sign(w
Tx)) to discriminate between the two classes. Show the
decision boundary between the two classes. For example, one approach could be to plot the
posterior over a 2D grid where the data lie. Note that you must use a linear classifier. I have
posted example code on Github. Matlab also has many built in functions to implement linear
classifiers and naïve Bayes.
(5pts) Naïve Bayes Spam Filter
A Spam data set has been uploaded to the ECE523 Github page (use data/spambase_train.csv).
Using whatever library you wish, implement a naïve Bayes classifier and report the 5-fold cross
validation error.
arizona.edu 3 January 13, 2017
Gregory Ditzler Dept. of ECE University of Arizona
(3pts) Bonus: Comparing Classifiers
A text file, hw1-scores.txt, containing classifier errors measurements has been uploaded to
D2L. Each of the columns represents a classifier and each row a data set that was evaluated.
Are all of the classifiers performing equally? Is there one or more classifiers that is performing
better than the others? Your response should be backed by statistics. Suggested reading:
• Janez Demšar, “Statistical Comparisons of Classifiers over Multiple Data Sets,” Journal of
Machine Learning Research, vol. 7, 1–20.
Read the abstract to get an idea about the theme of comparisons. Sections 3.1.3 and 3.2.2
can be used to answer the question posed here.
arizona.edu 4 January 13, 2017