{% import macros as m with context %} .. _inventory_game: Week 1: The Inventory-control game ============================================================================================================= {{ m.embed_game('week1_inventory') }} .. topic:: Controls :class: margin :kbd:`0`, :kbd:`1`, :kbd:`2` Buy the given number of items. :kbd:`Space` Take a random action :kbd:`p` Automatically take random actions :kbd:`r` Reset the game .. rubric:: Run locally :gitref:`../irlc/lectures/lec08/demo_bandit.py` {# .. raw:: html .. topic:: Controls :kbd:`0`, :kbd:`1`, :kbd:`2` Buy the given number of items. :kbd:`Space` Take a random action :kbd:`p` Automatically take random actions :kbd:`r` Reset the game #} .. topic:: What you see The example showcase the inventory-control environment which you implemented in week 1. The cars are the items delivered (i.e. your actions) and the noise terms are the customers. In the example, you can select actions yourself and the game will display the reward (recall that reward is minus the cost, i.e. :math:`r_k = -g(x_k, u_k, w_k)`). Your task is to get as much average reward as possible. If you press space, the game will buy random amounts of inventory and eventually compute the average reward for this policy -- you can perhaps do better than that? Here are the rules: - The inventory can hold 0, 1, or 2 items. - You can order 0, 1, or 2 items - Customers buy 0, 1, or 2 items - Excess inventory is discarded at the end of the day! The cost at each step is :math:`g_k(x_k, u_k, w_k) = u_k + (x_k + u_k - w_k)^2`.