# CS 7642 Homework 2 TD( λ )

\$30.00

## Description

Recall that the TD( λ ) estimator for an MDP can be thought of as a weighted combination of the
k-step estimators E for k ≥ 1. k
Consider the MDP described by the following state diagram. (Assume the discount factor is γ = 1.)
Procedure
● Find a value of λ , strictly less than 1, such that the TD estimate for λ equals that of the
● This HW is designed to help solidify your understanding of the Temporal Difference
algorithms and k-step estimators. You will be given the probability to State 1 and a vector
of rewards {r0, r1, r2, r3, r4, r5, r6}
● You will be given 10 test cases for which you will return the best lambda value for each.
and libraries you wish.
1
Examples
The following examples can be used to verify your calculation is correct.
● Input: probToState=0.81, valueEstimates={0.0,4.0,25.7,0.0,20.1,12.2,0.0},
rewards={7.9,-5.1,2.5,-7.2,9.0,0.0,1.6}, Output: 0.6226326309908364
● Input: probToState=0.22, valueEstimates={0.0,-5.2,0.0,25.4,10.6,9.2,12.3},
rewards={-2.4,0.8,4.0,2.5,8.6,-6.4,6.1}, Output: 0.49567093118984556
● Input: probToState=0.64, valueEstimates={0.0,4.9,7.8,-2.3,25.5,-10.2,-6.5},
rewards={-2.4,9.6,-7.8,0.1,3.4,-2.1,7.9}, Output: 0.20550275877409016
Resources
The concepts explored in this homework are covered by:
● Lectures
○ Lesson 3: TD and Friends