[CSCI-GA 3033-090] Special Topics: Deep Reinforcement Learning Homework 4, Exploration Algorithms

$40.00

Category: You will Instantly receive a download link for .zip solution file upon Payment

Description

5/5 - (6 votes)

Introduction
This homework is designed to follow up on the lecture about exploration algorithms, specifically about the multi-armed bandit. For this assignment, you will need to know about the basics of the bandit algorithm we talked about in class, and some basics of the epsilon greedy, upper confidence bound, and Thompson sampling algorithms. If you have not already, we propose you brush up on the lecture notes and read up about these algorithms online.

You are allowed to discuss this homework with your classmates. However, any work that you submit must be your own — which means you cannot share your code, and any work that you submit with your write-up must be written by only you.
Code folder
Find the folder with the provided code in the following google colab: https://colab.research.google.com/drive/19ht5cd7CoEkotj3bWnBaGKHaSdWhObH4?usp=sharing

Make a copy of the colab, edit it, and once you are done, submit it with your homework writeup.
Environment
Since this homework is in Google colab, it will not require any separate environment setup.
Submission
Please submit your homework using this google form link: https://forms.gle/h7gZCK1FFoPd4z4d7

Deadline for submission: October 29th, 2021, 11:59 PM Eastern Time.
Points
5 points for plotting.
5 points for each of the solvers working correctly: EpsilonGreedy, UCB, and Thompson Sampling working.
Total points: 20

Assignment
1. Make a copy of the colab to your drive, and then go through the skeleton code in it. At the very end, you will find a function that is supposed to run the bandit algorithms and plot their cumulative regret over time. Complete this function, and verify that it works by testing it with the two given environments and the FullyRandom solver.

Note: For a full score on this problem, the following must be true: each solver must be denoted by a different color, and each environment (Bernoulli bandit and Gaussian bandit) must be shown on a different plot. Make sure to label each of the two plots and each line in each plot with the associated algorithm as well. For formatting guidance, look at the given plot

2. Once you have finished it, implement EpsilonGreedy, UCB, and Thompson Sampling solvers. Make sure when you run the colab notebook, it generates the two associated plots: one for the Bernoulli bandit with all the algorithms, and another for the Gaussian bandit with all the algorithms.

Submit a link to this completed colab to the submission link above. Before submission, once again make sure that the following are in order:
a) plot titles,
b) axis labels,
c) line legends.

Also, make sure the sharing settings are turned on so we can check your solution and run it.