.. _models_dp:


Models used in DP
=============================================================================================================
Below is an overview of the models and environments we use in the DP section of the course

.. list-table:: Environments
   :header-rows: 1

   * - Name
     - Environment
     - File
     - Actions
     - States
   * - :ref:`pacman`
     - :class:`~irlc.pacman.gamestate.GameState`
     - :gitref:`../irlc/pacman/gamestate.py`. Note you don't have to read the file.
     - State-dependent: :python:`"North"`, :python:`"East"`, :python:`"South"`, :python:`"West"`, :python:`"Stop"`
     - Each state is a :class:`~irlc.pacman.gamestate.GameState` object
   * - :ref:`inventory_environment`
     - :class:`~irlc.ex01.inventory_environment.InventoryEnvironment`
     - See :gitref:`../irlc/ex01/inventory_environment.py`
     - :python:`Discrete(3)`
     - :python:`Discrete(3)`
   * - Bobs friend environment
     - :class:`~irlc.ex01.bobs_friend.BobFriendEnvironment`
     - See :gitref:`../irlc/ex01/bobs_friend.py`
     - :python:`Discrete(2)`
     - All positive numbers

This is a list of the notable models in this section of the course:

.. list-table:: Models
   :header-rows: 1

   * - Name
     - Class and file
     - Comments
   * - Pacman DP Model
     - - :class:`~irlc.project1.pacman.DPPacmanModel`
       - :gitref:`../irlc/project1/pacman.py`
     - This model corresponds to the Pacman game. You will implement it as part of project 1.

.. _pacman:

Pacman
--------------------------------------------------------------------------------------------------------
.. warning::

    When you use the Pacman-environment, I strongly recommend that you stick to the functions that are documented here (see :class:`~irlc.pacman.gamestate.GameState`) and mentioned in the project description.
    The :class:`~irlc.pacman.gamestate.GameState` object gives you access to other, internal, game-specific functions and data-structures, but their behavior may differ from what you expect and I recommend you don't use them.

Pacman is among the most complex environments considered in this course. Each state need to keep track of:

- Pacmans position
- The ghosts position
- The maze layout
- Remaining food pellets

To accomplish this, the states :math:`x_k` will therefore be a small class (:class:`~irlc.pacman.gamestate.GameState`).

Let's create an environment and use that :python:`print(state)` provides a convenient representation of the game configuration represented by the current state:

.. margin::

    .. plot::
        :caption: The small maze without ghosts used in the example
        :width: 300

        from irlc import plotenv
        from irlc.pacman.pacman_environment import PacmanEnvironment, very_small_maze
        env = PacmanEnvironment(layout_str=very_small_maze, render_mode='human')
        env.reset()
        plotenv(env)
        env.close()

.. runblock:: pycon

    >>> from irlc.pacman.pacman_environment import PacmanEnvironment, very_small_maze
    >>> env = PacmanEnvironment(very_small_maze)
    >>> s, _ = env.reset() # Works just like any other environment.
    >>> s # Confirm the state is an object
    >>> print(s)

The state has a few functions that tell Pacman what he can do. For instance, we can check what actions we have available or whether we won or lost as follows:

.. runblock:: pycon

    >>> from irlc.pacman.pacman_environment import PacmanEnvironment, very_small_maze
    >>> env = PacmanEnvironment(very_small_maze)
    >>> s, _ = env.reset() # Works just like any other environment.
    >>> print("Available actions are", s.A())
    >>> print("Have we won?", s.is_won(), "have we lost?", s.is_lost())


We can move around using the :python:`s.f(action)`-function as follows:

.. runblock:: pycon

    >>> from irlc.pacman.pacman_environment import PacmanEnvironment, very_small_maze
    >>> env = PacmanEnvironment(very_small_maze)
    >>> s0, _ = env.reset() # Works just like any other environment.
    >>> s1 = s0.f("East")
    >>> s2 = s1.f("East")
    >>> s3 = s2.f("East")
    >>> print("Is s2 won?", s2.is_won(), "is s3 won?", s3.is_won())


Pacman with ghosts
--------------------------------------------------------------------------------------------------------

.. margin::

    .. plot::
        :caption: Initial state of the 1-ghost example.
        :width: 300

        from irlc import plotenv
        from irlc.pacman.pacman_environment import PacmanEnvironment, very_small_haunted_maze
        env = PacmanEnvironment(layout_str=very_small_haunted_maze, render_mode='human')
        env.reset()
        plotenv(env)
        env.close()


When there are :math:`G` ghosts pacman become a multi-player game. Pacman is labelled as player number 0, and the ghosts are players number :math:`1, 2, \dots, G`.

The game then proceeds in turns starting with Pacman. He makes a move, and then each of the ghosts make a move (in order), and finally we are back at Pacmans turn.


.. margin::

    .. plot::
        :caption: State after Pacman and the ghost has moved (see code to the left)
        :width: 300

        from irlc import plotenv
        from irlc.pacman.pacman_environment import PacmanEnvironment, very_small_haunted_maze
        env = PacmanEnvironment(layout_str=very_small_haunted_maze, render_mode='human')
        s0, _ = env.reset() # Works just like any other environment.
        s0 = s0.f("East") # Pacman has now moved east
        env.game.state = s0.f("West") # Small hack for making the visualization.
        plotenv(env)
        env.close()


The following example illustrates the effect of Pacman and the ghost taking one step each:


.. runblock:: pycon

    >>> from irlc.pacman.pacman_environment import PacmanEnvironment, very_small_maze
    >>> print("The maze layout\n",very_small_maze)
    >>> env = PacmanEnvironment(very_small_maze)
    >>> s0, _ = env.reset() # Works just like any other environment.
    >>> s0.players() # Get the number of players
    >>> s0.player() # Get the curent player
    >>> s0 = s0.f("East") # Pacman has now moved east
    >>> s0.player() # It is now the ghosts turn
    >>> s0 = s0.f("West") # The ghost moves west and it is pacmans turn again
    >>> env.close()


Hint about the win-probability questions
--------------------------------------------------------------------------------------------------------
.. warning::
    :class: margin

    This hint does not add or change anything about the problem. It is simply pointing out that the ghosts do not not always have 3 actions,
    i.e., it may be the case that :func:`~irlc.pacman.gamestate.GameState.A` can contain something less than 3 elements,
    and therefore code that relies on this assumption may give wrong results.


The win-probability problems, in particular for two ghosts, is typically what causes the most problems. If you passed all the others tests,
the problem will mostly likely come down to the implementation of the
:python:`p_next` function and specifically how you computed the probabilities.

Lets say there is one ghost. In that case the ghost can actually have 1, 2 or 3 actions available.
As an example, let's take the maze which was plotted above:

.. runblock:: pycon

    >>> from irlc.pacman.pacman_environment import PacmanEnvironment, very_small_haunted_maze
    >>> env = PacmanEnvironment(layout_str=very_small_haunted_maze)
    >>> s0, _ = env.reset() # Get starting state
    >>> s0 = s0.f("East") # Pacman has now moved east
    >>> print(f"It is the ghosts turn since {s0.player()=} and the actions are {s0.A()=}")
    >>> env.close()

Another important case where this occurs is when the game is lost, in which case both pacman and the ghost only has a single action available
(i.e., :python:`len(s.A()) == 1`).

If you compute the probabilities using :math:`p(w | x_k, u) = \frac{1}{| \mathcal{A} |}` where :math:`\mathcal{A}` are the actions
available to the ghost in a given position (see  :func:`~irlc.pacman.gamestate.GameState.A`) this will not be a problem, however,
for two ghosts you need to be a little more careful. In this case,
the game update consists of

#. Pacman makes a move
#. Ghost 1 takes one of :math:`\mathcal{A}'` available actions
#. Ghost 2 takes one of :math:`\mathcal{A}''` available actions

The probability is therefore the chance of the last two events:

.. math::

    p(w | x, u) = \frac{1}{ | \mathcal{A}'| }\frac{1}{ | \mathcal{A}''| }

In most cases these probabilities will be :math:`\frac{1}{9}`. However, if e.g. ghost 1 eats pacman,
these probabilities can be correspondingly higher since :math:`| \mathcal{A}''| = 1`

.. tip::
    :class: margin

    Note that when you implement the one-ghost case you will have to compute :math:`\frac{1}{ | \mathcal{A}'| }`.
    Thus the two ghost case is just\ :sup:`TM` a matter of computing :math:`\frac{1}{ | \mathcal{A}'| }` as in the one-ghost case and multiply it the inverse of the number of actions for the second ghost (i.e., :math:`\frac{1}{ | \mathcal{A}''| }`).

The takeaways are:

- Check your probabilities sum to 1.
- Don't normalize your probabilities at the end of :python:`p_next`. If they don't sum to 1, you have a bug.


.. autoclass:: irlc.pacman.gamestate.GameState
  :members: