Sale!

CS 6140 Machine Learning: Assignment – 1

$50.00 $30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (3 votes)

Q1. Decision Tree Classifier (50 points)

Q1.1 Growing Decison Trees from scratch (40 points)

Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal of this question in the assignment is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. You must also print the Decision Tree. Use information gain based on entropy as the splitting measure.

Use the data.csv dataset for this particular question. The dataset should be uploaded on Canvas with Assignment 1. Split the dataset into training and test data and calculate testing accuracy.

Q1.2 Decision Tree using Sklearn Library (10 points)

Use the Decision Tree Classifier from the Sklearn Library and use gini index as a splitting measure. Use the data.csv dataset. Calculate accuracy for this model. Print the Decision tree and compare the Decision Trees generated from your code and Sklearn.

Q2 Linear Regression (40 points)

Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable.

Gradient descent algorithm

θ+=θ+αm(yih(xi))x¯�+=�−+��(��−ℎ(��))�¯

This minimizes the following cost function

J(x,θ,y)=12mi=1m(h(xi)yi)2�(�,�,�)=12�∑�=1�(ℎ(��)−��)2

where

h(xi)=θTx¯ℎ(��)=���¯
# Do not change the code in this cell
true_slope = 15
true_intercept = 2.4
input_var = np.arange(0.0,100.0)
output_var = true_slope * input_var + true_intercept + 300.0 * np.random.rand(len(input_var))
# Do not change the code in this cell
plt.figure()
plt.scatter(input_var, output_var)
plt.xlabel('x')
plt.ylabel('y')
plt.show()
def compute_cost(ip, op, params):
    """
    Cost function in linear regression where the cost is calculated
    ip: input variables
    op: output variables
    params: corresponding parameters
    Returns cost
    """
    num_samples = len(ip)
    cost_sum = 0.0
    for x,y in zip(ip, op):
        y_hat = np.dot(params, np.array([1.0, x]))
        cost_sum += (y_hat - y) ** 2
    
    cost = cost_sum / (num_samples)
    
    return cost

Q2.1 Implement Linear Regression using Batch Gradient Descent from scratch. (15 points)

Batch gradient descent

Algorithm can be given as follows:

    for i in 0 -> m: 
        theta += (alpha / m) * (y[i] - h(x[i])) * x_bar
def linear_regression_using_batch_gradient_descent(ip, op, params, alpha, max_iter):
    """
    Compute the params for linear regression using batch gradient descent
    ip: input variables
    op: output variables
    params: corresponding parameters
    alpha: learning rate
    max_iter: maximum number of iterations
    Returns parameters, cost, params_store
    """ 
    # initialize iteration, number of samples, cost and parameter array
    iteration = 0
    num_samples = len(ip)
    cost = np.zeros(max_iter)
    params_store = np.zeros([2, max_iter])
    
    # Compute the cost and store the params for the corresponding cost
    while iteration < max_iter:
        cost[iteration] = compute_cost(ip, op, params)
        params_store[:, iteration] = params
        
        print('--------------------------')
        print(f'iteration: {iteration}')
        print(f'cost: {cost[iteration]}')
        
        
        # Apply batch gradient descent
        None
    
    return params, cost, params_store
# Do not change the code in this cell
# Training the model
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(input_var, output_var, test_size=0.20)

params_0 = np.array([20.0, 80.0])

alpha_batch = 1e-3
max_iter = 100
params_hat_batch, cost_batch, params_store_batch =\
    linear_regression_using_batch_gradient_descent(x_train, y_train, params_0, alpha_batch, max_iter)
--------------------------
iteration: 0
cost: 13192814.727851184
--------------------------
iteration: 1
cost: 22978.459683996098
--------------------------
iteration: 2
cost: 11885.25642780868
--------------------------
iteration: 3
cost: 11964.5078971518
--------------------------
iteration: 4
cost: 11965.559740786983
--------------------------
iteration: 5
cost: 11963.600050005338
--------------------------
iteration: 6
cost: 11961.542961665567
--------------------------
iteration: 7
cost: 11959.483798227724
--------------------------
iteration: 8
cost: 11957.425661469053
--------------------------
iteration: 9
cost: 11955.36865172363
--------------------------
iteration: 10
cost: 11953.31277165639
--------------------------
iteration: 11
cost: 11951.258020754422
--------------------------
iteration: 12
cost: 11949.20439840172
--------------------------
iteration: 13
cost: 11947.151903979264
--------------------------
iteration: 14
cost: 11945.100536868216
--------------------------
iteration: 15
cost: 11943.050296450132
--------------------------
iteration: 16
cost: 11941.001182106873
--------------------------
iteration: 17
cost: 11938.95319322067
--------------------------
iteration: 18
cost: 11936.90632917405
--------------------------
iteration: 19
cost: 11934.860589349903
--------------------------
iteration: 20
cost: 11932.81597313147
--------------------------
iteration: 21
cost: 11930.772479902294
--------------------------
iteration: 22
cost: 11928.73010904629
--------------------------
iteration: 23
cost: 11926.6888599477
--------------------------
iteration: 24
cost: 11924.6487319911
--------------------------
iteration: 25
cost: 11922.609724561396
--------------------------
iteration: 26
cost: 11920.571837043859
--------------------------
iteration: 27
cost: 11918.535068824054
--------------------------
iteration: 28
cost: 11916.499419287935
--------------------------
iteration: 29
cost: 11914.46488782174
--------------------------
iteration: 30
cost: 11912.431473812087
--------------------------
iteration: 31
cost: 11910.399176645897
--------------------------
iteration: 32
cost: 11908.367995710456
--------------------------
iteration: 33
cost: 11906.337930393363
--------------------------
iteration: 34
cost: 11904.308980082558
--------------------------
iteration: 35
cost: 11902.281144166343
--------------------------
iteration: 36
cost: 11900.254422033293
--------------------------
iteration: 37
cost: 11898.228813072385
--------------------------
iteration: 38
cost: 11896.204316672915
--------------------------
iteration: 39
cost: 11894.18093222447
--------------------------
iteration: 40
cost: 11892.158659117047
--------------------------
iteration: 41
cost: 11890.1374967409
--------------------------
iteration: 42
cost: 11888.117444486666
--------------------------
iteration: 43
cost: 11886.098501745273
--------------------------
iteration: 44
cost: 11884.080667908069
--------------------------
iteration: 45
cost: 11882.063942366662
--------------------------
iteration: 46
cost: 11880.048324512985
--------------------------
iteration: 47
cost: 11878.033813739334
--------------------------
iteration: 48
cost: 11876.020409438344
--------------------------
iteration: 49
cost: 11874.008111002993
--------------------------
iteration: 50
cost: 11871.996917826538
--------------------------
iteration: 51
cost: 11869.98682930262
--------------------------
iteration: 52
cost: 11867.977844825207
--------------------------
iteration: 53
cost: 11865.969963788566
--------------------------
iteration: 54
cost: 11863.96318558733
--------------------------
iteration: 55
cost: 11861.957509616444
--------------------------
iteration: 56
cost: 11859.952935271203
--------------------------
iteration: 57
cost: 11857.94946194721
--------------------------
iteration: 58
cost: 11855.947089040437
--------------------------
iteration: 59
cost: 11853.945815947141
--------------------------
iteration: 60
cost: 11851.945642063933
--------------------------
iteration: 61
cost: 11849.946566787756
--------------------------
iteration: 62
cost: 11847.948589515901
--------------------------
iteration: 63
cost: 11845.951709645935
--------------------------
iteration: 64
cost: 11843.955926575803
--------------------------
iteration: 65
cost: 11841.961239703789
--------------------------
iteration: 66
cost: 11839.967648428456
--------------------------
iteration: 67
cost: 11837.975152148732
--------------------------
iteration: 68
cost: 11835.983750263902
--------------------------
iteration: 69
cost: 11833.993442173512
--------------------------
iteration: 70
cost: 11832.004227277463
--------------------------
iteration: 71
cost: 11830.01610497601
--------------------------
iteration: 72
cost: 11828.029074669745
--------------------------
iteration: 73
cost: 11826.043135759533
--------------------------
iteration: 74
cost: 11824.058287646603
--------------------------
iteration: 75
cost: 11822.07452973253
--------------------------
iteration: 76
cost: 11820.091861419156
--------------------------
iteration: 77
cost: 11818.110282108735
--------------------------
iteration: 78
cost: 11816.129791203788
--------------------------
iteration: 79
cost: 11814.150388107148
--------------------------
iteration: 80
cost: 11812.172072222062
--------------------------
iteration: 81
cost: 11810.194842952025
--------------------------
iteration: 82
cost: 11808.218699700868
--------------------------
iteration: 83
cost: 11806.243641872796
--------------------------
iteration: 84
cost: 11804.269668872272
--------------------------
iteration: 85
cost: 11802.296780104165
--------------------------
iteration: 86
cost: 11800.324974973595
--------------------------
iteration: 87
cost: 11798.35425288606
--------------------------
iteration: 88
cost: 11796.38461324735
--------------------------
iteration: 89
cost: 11794.416055463607
--------------------------
iteration: 90
cost: 11792.448578941287
--------------------------
iteration: 91
cost: 11790.48218308719
--------------------------
iteration: 92
cost: 11788.516867308379
--------------------------
iteration: 93
cost: 11786.552631012328
--------------------------
iteration: 94
cost: 11784.58947360679
--------------------------
iteration: 95
cost: 11782.627394499821
--------------------------
iteration: 96
cost: 11780.66639309983
--------------------------
iteration: 97
cost: 11778.70646881558
--------------------------
iteration: 98
cost: 11776.747621056118
--------------------------
iteration: 99
cost: 11774.789849230787

Q2.2 Implement Stochastic Gradient Descent from scratch. (15 points)

Stochastic Gradient Descent

Algorithm can be given as follows:

for i in 0 -> m:
    theta += (alpha / m) * (y[i] - h(x[i])) * x_bar  
def lin_reg_stoch_gradient_descent(ip, op, params, alpha):
    """
    Compute the params for linear regression using stochastic gradient descent
    ip: input variables
    op: output variables
    params: corresponding parameters
    alpha: learning rate
    Returns parameters, cost, params_store
    """
    
    # initialize iteration, number of samples, cost and parameter array
    num_samples = len(input_var)
    cost = np.zeros(num_samples)
    params_store = np.zeros([2, num_samples])
    
    i = 0
    # Compute the cost and store the params for the corresponding cost
    for x,y in zip(input_var, output_var):
        cost[i] = compute_cost(input_var, output_var, params)
        params_store[:, i] = params
        
        print('--------------------------')
        print(f'iteration: {i}')
        print(f'cost: {cost[i]}')
        
        # Apply stochastic gradient descent
        None
            
    return params, cost, params_store
# Do not change the code in this cell
alpha = 1e-3
params_0 = np.array([20.0, 80.0])
params_hat, cost, params_store =\
lin_reg_stoch_gradient_descent(x_train, y_train, params_0, alpha)
--------------------------
iteration: 0
cost: 13144888.680606477
--------------------------
iteration: 1
cost: 13144904.387611339
--------------------------
iteration: 2
cost: 13145433.87395811
--------------------------
iteration: 3
cost: 13144673.959294325
--------------------------
iteration: 4
cost: 13144982.09824148
--------------------------
iteration: 5
cost: 13141302.14178
--------------------------
iteration: 6
cost: 13137333.663826264
--------------------------
iteration: 7
cost: 13127653.43123418
--------------------------
iteration: 8
cost: 13118100.471445723
--------------------------
iteration: 9
cost: 13103429.241717
--------------------------
iteration: 10
cost: 13086270.30504895
--------------------------
iteration: 11
cost: 13062221.19901191
--------------------------
iteration: 12
cost: 13041581.498801691
--------------------------
iteration: 13
cost: 13014437.570788614
--------------------------
iteration: 14
cost: 12968631.126795016
--------------------------
iteration: 15
cost: 12916766.787084088
--------------------------
iteration: 16
cost: 12857324.379810132
--------------------------
iteration: 17
cost: 12795466.857070174
--------------------------
iteration: 18
cost: 12719754.324476423
--------------------------
iteration: 19
cost: 12642643.537183702
--------------------------
iteration: 20
cost: 12562841.960312825
--------------------------
iteration: 21
cost: 12468195.55929051
--------------------------
iteration: 22
cost: 12353952.231334643
--------------------------
iteration: 23
cost: 12235362.022596583
--------------------------
iteration: 24
cost: 12101505.232380655
--------------------------
iteration: 25
cost: 11970148.898274984
--------------------------
iteration: 26
cost: 11840014.799244458
--------------------------
iteration: 27
cost: 11693774.684158767
--------------------------
iteration: 28
cost: 11517755.8728081
--------------------------
iteration: 29
cost: 11338463.968997922
--------------------------
iteration: 30
cost: 11163003.789907044
--------------------------
iteration: 31
cost: 10964990.48298915
--------------------------
iteration: 32
cost: 10749375.85907021
--------------------------
iteration: 33
cost: 10551193.927620357
--------------------------
iteration: 34
cost: 10345975.45112719
--------------------------
iteration: 35
cost: 10109007.003828494
--------------------------
iteration: 36
cost: 9870363.53835074
--------------------------
iteration: 37
cost: 9625852.02628601
--------------------------
iteration: 38
cost: 9371381.232873833
--------------------------
iteration: 39
cost: 9105355.929632533
--------------------------
iteration: 40
cost: 8839278.911337996
--------------------------
iteration: 41
cost: 8574391.358475495
--------------------------
iteration: 42
cost: 8298675.14298877
--------------------------
iteration: 43
cost: 8004246.449074103
--------------------------
iteration: 44
cost: 7707139.256249027
--------------------------
iteration: 45
cost: 7399894.848703644
--------------------------
iteration: 46
cost: 7122350.871890693
--------------------------
iteration: 47
cost: 6841296.454498828
--------------------------
iteration: 48
cost: 6554897.243249216
--------------------------
iteration: 49
cost: 6250356.593686969
--------------------------
iteration: 50
cost: 5978655.702241723
--------------------------
iteration: 51
cost: 5671154.553501252
--------------------------
iteration: 52
cost: 5380408.390804444
--------------------------
iteration: 53
cost: 5082131.721512554
--------------------------
iteration: 54
cost: 4816442.372667548
--------------------------
iteration: 55
cost: 4564043.615089802
--------------------------
iteration: 56
cost: 4296906.083235078
--------------------------
iteration: 57
cost: 4022835.084239657
--------------------------
iteration: 58
cost: 3773125.9156562807
--------------------------
iteration: 59
cost: 3546088.742107074
--------------------------
iteration: 60
cost: 3295471.581822032
--------------------------
iteration: 61
cost: 3081459.3994124234
--------------------------
iteration: 62
cost: 2844263.7899843347
--------------------------
iteration: 63
cost: 2618115.02697635
--------------------------
iteration: 64
cost: 2408772.5150151644
--------------------------
iteration: 65
cost: 2214062.1342881643
--------------------------
iteration: 66
cost: 2035100.917899321
--------------------------
iteration: 67
cost: 1862300.496673255
--------------------------
iteration: 68
cost: 1701301.9864399252
--------------------------
iteration: 69
cost: 1539924.5180808145
--------------------------
iteration: 70
cost: 1400707.4834754874
--------------------------
iteration: 71
cost: 1278238.396088922
--------------------------
iteration: 72
cost: 1165937.6138728214
--------------------------
iteration: 73
cost: 1047226.4938960373
--------------------------
iteration: 74
cost: 951116.8225384253
--------------------------
iteration: 75
cost: 841703.3440724618
--------------------------
iteration: 76
cost: 743039.3566551396
--------------------------
iteration: 77
cost: 670031.3103910342
--------------------------
iteration: 78
cost: 592097.915411781
--------------------------
iteration: 79
cost: 514874.7222320943
--------------------------
iteration: 80
cost: 447063.0496916281
--------------------------
iteration: 81
cost: 393054.5319264231
--------------------------
iteration: 82
cost: 339943.01094985014
--------------------------
iteration: 83
cost: 295877.342145335
--------------------------
iteration: 84
cost: 253530.7280115814
--------------------------
iteration: 85
cost: 221793.93231878805
--------------------------
iteration: 86
cost: 197994.98902654447
--------------------------
iteration: 87
cost: 165553.93704799848
--------------------------
iteration: 88
cost: 138222.5916104802
--------------------------
iteration: 89
cost: 113819.29468310646
--------------------------
iteration: 90
cost: 94815.69718471098
--------------------------
iteration: 91
cost: 83956.24685990743
--------------------------
iteration: 92
cost: 75358.92655089678
--------------------------
iteration: 93
cost: 65718.79206477552
--------------------------
iteration: 94
cost: 54727.15652442515
--------------------------
iteration: 95
cost: 45382.97190382245
--------------------------
iteration: 96
cost: 40020.56372322326
--------------------------
iteration: 97
cost: 35793.0599352847
--------------------------
iteration: 98
cost: 30775.80357452722
--------------------------
iteration: 99
cost: 24597.82512646985

Q2.3 Calculate Root Mean Square error in batch gradient descent algorithm and stochastic gradient descent algorithm (5 points)

# Calculate Root Mean Square error in batch gradient descent algorithm and stochastic gradient descent algorithm
Batch RMS:      93.7947330891927
Stochastic RMS: 125.24063958434985
# Do not change the code in this cell
plt.figure()
plt.plot(np.arange(max_iter), cost_batch, 'r', label='batch')
plt.plot(np.arange(len(cost)), cost, 'g', label='stochastic')
plt.xlabel('iteration')
plt.ylabel('normalized cost')
plt.legend()
plt.show()
print(f'min cost with BGD: {np.min(cost_batch)}')
print(f'min cost with SGD: {np.min(cost)}')
min cost with BGD: 11774.789849230787
min cost with SGD: 24597.82512646985

Q2.4 Which linear regression model do you think works best for this data? Explain in brief. (5 points)

Q3. Linear Regression Analytical Problem (10 points)

Consider the following training data.

X1 X2 Y
0 0 0
0 1 1.5
1 0 2
1 1 2.5
Suppose the data comes from a model y = θ0θ0 +θ1θ1x1 +θ2θ2x2 for unknown constants θ0θ0,θ1θ1,θ2θ2. Use least squares linear regression to find an estimate of θ0θ0,θ1θ1,θ2θ2.