Week 9: Value iteration#
What you see#
The example shows the value iteration algorithm on a simple (deterministic) gridworld with a living reward of
Every time you move pacman the game will execute a single update of the value-iteration algorithm. You can change between the value-function and action-value function by pressing m.
The algorithm will converge after about 20 steps and thereby compute both
How it works#
When computing e.g. the value function
Where the expectation is with respect to the next state
If the problem was not deterministic we would need to compute the average over the next (possible) states as given by the MDP