Week 1: The Pacman game

Week 1: The Pacman game#

What you see#

This example show the Pacman game. If you press p Pacman will be controlled by a random agent (i.e., one which takes random actions).

The score is accumulated during an episode (i.e., in each step you get a reward, which can be positive and negative, which is added to your score), and your goal is to maximize the score you obtain on average over many episodes. A policy which obtain the highest reward possible, i.e. play perfectly, is called an optimal policy.

How it works#

You can read much more about the game, and specifically how to use it in project 1, in the section The Pacman Game.

Purpose#

The point of the game is to illustrate states (i.e., what you see on the screen), actions (your inputs on the keyboard) and reward (the score you get in each step). Notice how the score (reward) can be positive and negative, and how the game keep track of the total score – the total score over an episode is what we try to maximize in control theory and reinforcement learning!

In this example the reward function is -1 per step plus a bonus of 10 points for eating a pellet. Finally you get a reward of 500 for winning (eating all pellets) and (-500) if you are eaten by a ghost. Note that this is an arbitrary choice – you can perhaps think about other reward functions and how they will affect the behavior of Pacman. For instance, what happens if you get a reward of 1 whenever you win? What does the average reward in this case correspond to?

In project 1 you will implement an optimal pacman-playing robot using dynamical programming, and part of the task will be to define an appropriate reward function.