Description
Before you start, please download features.arff from IVLE. You will need it for this assignment.
1. Visualization
Before performing any analysis of data, a good starting point is always trying to visualize
it. What does your data look like? To visualize, plot the following pairs of features:
• ZCR MEAN TIME (x-axis) and PAR MEAN TIME (y-axis)
• ZCR STD TIME (x-axis) and PAR STD TIME (y-axis)
Each plot should have axis labels, distinguishable markers and a legend. Save these plots
as zcr-par-mean.png and zcr-par-std.png. Hint: you can use the python library arff
to load an arff file. Could you use any of these features to distinguish between music and
speech? Why? Keep these questions in mind for the next section.
2. Classification
Build a classifier to perform classification on the features in the given ARFF file in Weka
with trees.LMT (Logistic Model Tree) with 10-fold cross-validation and save the results.
Choose at least one other classification algorithm with 10-fold cross-validation to build
another classifier, perform classification and save the results.
Compare the results of these two algorithms and save your findings to a file called
classifications-results.txt. This file should answer at least these three questions:
• Which algorithm gives you the best results?
• Which features contribute most significantly in classification?
• Where do these features come from (time, spectral or perceptual domain) and why
do you think these features can contribute?
You are encouraged to write down any other findings. After answering these questions,
save the Weka results for the two algorithms and put them at the end of the text file.
3. Submit a zip file to IVLE containing the two plots (zcr-par-mean.png, zcr-par-std.png)
and the classification-results.txt. Name the zip file using your student number
(e.g. A0123456H.zip). Late submissions will receive no marks.
4. Grading scheme:
• 3/7 marks: correct and well labeled plots.
• 4/7 marks: results and discussion of Weka output.
1