The random walk problem is represented as a Markov Reward Process (MRP) with n states. Our goal is to predict the expected value of each state V(s), representing the probability of reaching the rightmost terminal state, using Monte Carlo (MC), TD(n), and TD(λ) methods.
The process starts from the center state and moves left or right with equal probability. The episodes terminate when the process reaches either the extreme left (state 1) or the extreme right (state n). A reward of +1 is given upon reaching the terminal state at the extreme right, and no reward (0) is given for reaching the extreme left. The task is undiscounted.
More to come here