Week 3: Frozen lake and dynamical programming

Week 3: Frozen lake and dynamical programming#

What you see#

The example show dynamical programming applied to the frozen-lake game using \(N=40\) planning steps.

When the game is run, the first 40 steps will show each of the 40 steps of the DP algorithm. Specifically recall that DP iteratively update

\[J_k(x_k) = \min_{u_k} \mathbb{E}\left[ g_k(x_k, u_k, w_k) + J_{k+1}( f_k(x_k, u_k, w_k) ) \right]\]

Each square corresponds to a state \(x_k\) and I show the corresponding value of \(J_k(x_k)\) – and \(k\) counts down because the DP algorithm starts at \(k=N-1\).

After the planning steps, you can press space to select actions from the optimal policy. The small arrows show what action(s) are considered optimal, but since the rules are not deterministic the resulting behavior may not look very smart.

How it works#

Similar to the \(pacman\) game, the main difficulty is to implement a dynamical programming model. In the case of the game the following choices are made: - If the player enters a hole or a won square (i.e., those with the white numbers on them), then the player is ‘stuck’ at that square until planning round \(k=N\). This means the player has a single action which just keep the player in the square. - The terminal cost is \(g_N(x_k) = -1\) if the player is in the won square (lower-right corner) and otherwise 0.