Description

5/5 - (8 votes)

ROC and Precision Recall curves

Generate data: Simulate a binary classification problem by generating a vector of class labels. Size 100. Generate a vector of predictor estimates using a random number generator. (5 Points)
Calculate and plot ROC and Precision-Recall curves. (20 Points)
Match your curve generated with sklearn. (5 Points)

Random Forest classifier

Load iris data set.

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, plot_confusion_matrix

Investigate following parameters of Random Forest classifier and tune them using Randomized Search and Grid Search.

from sklearn.model_selection import RandomizedSearchCV,GridSearchCV
# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt','log2']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(10, 1000,10)]
# Minimum number of samples required to split a node
min_samples_split = [2, 5, 8, 11,14]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 2, 4,6,8]

Use seed 1 to split data in 80-20 train-test configuration. Train a Random Forest classifier with each unique configuration and record train/test accuracy, precision and recall in the results dataframe. This dataframe will have 5 columns (each corresponding to tuning parameter) and each row will correspond to each unique configuration. 5x5x5x5x5 rows. Analyse of the impact of each tuning parameter on predictor performance. (15 Points)
From the results of the above find the best estimators and use them for classifcation once again and evaluate the performance using 10 fold cross validation. (15 Points)

Heirarchical Agglomerative Clustering HAC

Load iris dataset from sklearn.

from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data

Implement HAC algorithm. Use the abstract class definition provided below. (15 Points)

Test your code first with uni-variate data as following; (10 Points)

x = {'JAN':31.9, 'FEB':32.3, 'MAR':35, 'APR':52, 'MAY':60.8, 'JUN':68.7, 'JUL':73.3, 'AUG':72.1, 'SEP':65.2, 'OCT':54.8, 'NOV':40, 'DEC':38}
hac = HAC(param={'dist': 'eucl'})
hac.fit(x)
for c in hac.dendrogram:
 print(c)

Expected output:

(0, ['JAN', 'FEB'], 0.4)
(1, ['JUL', 'AUG'], 1.2)
(2, ['NOV', 'DEC'], 2.0)
(3, ['APR', 'OCT'], 2.8)
(4, ['JAN', 'FEB', 'MAR'], 2.9)
(5, ['JUN', 'SEP'], 3.5)
(6, ['APR', 'OCT', 'MAY'], 7.4)

Fit the HAC model to iris dataset. Print the heirarchy of clusters creatively. It need not to be a dendrogram but you can use sklearn implementation for comparison. (15 Points)

class HAC:
  def __init__(self, X, param):
    self.X = x
    self.__distances__(param['dist'])
    
  
  def __distances__(self, dist='eucl'):
    '''
    Implement __distances__ method to caculate pair-wise distances 
    among datapoint in X with respect to distance measures
    - eucl : eucledean distance
    - manh : manhattan
    - misk : miskownski
    '''
    if dist not in ['eucl', 'manh', 'misk']:
      raise Exception('Not a valid dist measure. Choose among eucl, manh, misk')
    
    self.C = None
    
  def __merge__(self):
    '''
    Implement __merge__ method to recursively merge the nearest datapoints in X
    using pair-wise distances matrix X. 
    Save the merge results at each iteration/'recursive call' 
    in dendrogram list of clusters.
    '''
    self.denrogram = None

  def __display__(self):
    '''
    Implement __display__ method to cretively show the contents of dendrogram.    
    '''
    pass

  def fit(self, X):
    self.X = list(x.values())
    self.labels = list(x.keys())
    self.__distances__()
    self.dendrogram = list()
    self.__merge__()

<__main__.HAC at 0x7f6b46433050>

CS 6140 Assignment 3 ROC and Precision Recall curves

Description

ROC and Precision Recall curves

Random Forest classifier

Heirarchical Agglomerative Clustering HAC

Related products

Machine Learning (CS 6140) Homework 1

Machine Learning (CS 6140) Homework 4

CS 6140 Machine Learning: Assignment – 1