Week 1: The Inventory-control game#
Purpose#
The inventory-control problem is one we will study in depth during this course, and I will invite you to compare it to Pacman. Note in particular
In pacman, a state is the full configuration on the screen (i.e., Pacmans location and which pellets are eaten or not). In the inventory-control game, the state is just a number (0, 1, or 2), denoting the size of the inventory
Note that the visualization shows all the states that have been computed (by contrast, Pacman just show a single state)
Try to manually compute the cost-function \(g_k\) and verify it is implemented correctly. The simulation shown above works exactly as the one you have seen in the lectures!
A policy is in this case a function which accept an integer (0, 1, or 2) and return another integer (0, 1, or 2). Try a couple of policies multiple times (for instance, compare a policy which always press 1 vs. one which always press 2). Note that the average reward, computed over multiple episodes, is different. Your task is to find the policy with the highest average reward.