mlfortherestofus

GridWorld with Value Iteration and Policy Iteration

Problem

Setup:

The agent navigates a grid, starting from any non-terminal state and moving up, down, left, or right. Each movement incurs a reward of -1 until the agent reaches the terminal state, which has a reward of 0. Our goal is to compute the optimal policy or value function using either value iteration or policy iteration. We plot the delta value and learning rate over iterations to evaluate the value iteration and policy iteration algorithms.

Running the code

More to come here