## Description

Description: The aim of this homework is to get you acquainted with implementing a decision tree as

discussed in class. Your implementation should be able to run on a data with two types of features

(numeric and categorical).

Use the following data for testing your implementation: (ABALONE –

https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/. You can use Python libraries to

read this data file.

This project is expecting you to write two functions. The first will take the training data including the

type of features and returns a decision tree best modeling the classification problem. This should not

do any pruning of the tree. An optional part of the homework will require the pruning step. The second

function will just apply the decision tree to a given set of data points.

You should follow the following for this homework:

1. Load the data and performa a quick analysis of what it is and what features it has. You will

need to construct a vector indicating the type (values) of each of the features. In this case, you

can assume that you have numeric (real or integer) and categorical values.

2. Implement the function “build_dt(X, y, attribute_types, options)”.

a. X: is the matrix of features/attributes for the training data. Each row includes a data

sample.

b. y: The vector containing the class labels for each sample in the rows of X.

c. attribute_types: The vector containing (1: integer/real) or (2: categorical) indicating

the type of each attributes (the columns of X).

d. options: Any options you might want to pass to your decision tree builder.

e. Returns a decision tree of the structure of your choice.

3. Implement the function “predict_dt(dt, X, options)”.

a. dt: The decision tree modeled by “build_dt” function.

b. X: is the matrix of features/attributes for the test data.

c. Returns a vector for the predicted class labels.

4. Report the performance of your implementation using an appropriate k-fold cross validation

using confusion matrices on the given dataset.

[Optional] Implement the pruning strategy discussed in the class. Repeat the steps 4 above. Indicate

any assumptions you might have made.

What to hand in: You are expected to hand in one of the following

• HW2_lastname_firstname_studentnumber_code.ipynb (the Python notebook file containing

the code and report output).

Your notebook should include something like the following:

Implementation of Decision Tree Modeling Function:

Implementation of Decision Tree Testing Function:

Results of k-fold cross validation: