Exercise 11: Model-Free Control with tabular and linear methods#

Note

  • This page contains background information which may be useful in future exercises or projects. You can download this weeks exercise instructions from here:

  • Slides: [1x] ([6x]). Reading: Chapter 6.4-6.5; 7-7.2; 9-9.3; 10.1, [SB18].

  • You are encouraged to prepare the homework problems 1 (indicated by a hand in the PDF file) at home and present your solution during the exercise session.

  • To get the newest version of the course material, please see Making sure your files are up to date

Linear function approximators#

The idea behind linear function approximation of \(Q\)-values is that

  • We initialize (and eventually learn) a \(d\)-dimensional weight vector \(w \in \mathbb{R}^d\)

  • We assume there exists a function to compute a \(d\)-dimensional feature vector \(x(s,a) \in \mathbb{R}^d\)

  • The \(Q\)-values are then represented as

    \[Q(s,a) = x(s,a)^\top w\]

Learning is therefore entirely about updating \(w\). We are going to use a class, LinearQEncoder, to implement the tile-coding procedure for defining \(x(s,a)\) as described in (Sutton and Barto [SB18]).

The following example shows how you initialize the linear \(Q\)-values and compute them in a given state:

/builds/02465material/02465public/py312/lib/python3.12/site-packages/pygame/pkgdata.py:25: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import resource_stream, resource_exists

For learning, you can simply update \(w\) as any other variable, and there is a convenience method to get the optimal action. The following example will illustrate a basic usage:

/builds/02465material/02465public/py312/lib/python3.12/site-packages/pygame/pkgdata.py:25: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import resource_stream, resource_exists

Note

Depending on how \(x(s,a)\) is defined, the linear encoder can behave very differently. I have therefore included a few different classes in irlc.ex09.feature_encoder which only differ in how \(x(s,a)\) is computed. I have chosen to focus this guide on the linear tile-encoder which is used in the MountainCar environment and is the main example in (Sutton and Barto [SB18]). The API for the other classes is entirely similar.

Classes and functions#

class irlc.ex11.feature_encoder.FeatureEncoder(env)[source]#

Bases: object

The idea behind linear function approximation of \(Q\)-values is that

  • We initialize (and eventually learn) a \(d\)-dimensional weight vector \(w \in \mathbb{R}^d\)

  • We assume there exists a function to compute a \(d\)-dimensional feature vector \(x(s,a) \in \mathbb{R}^d\)

  • The \(Q\)-values are then represented as

    \[Q(s,a) = x(s,a)^\top w\]

Learning is therefore entirely about updating \(w\).

The following example shows how you initialize the linear \(Q\)-values and compute them in a given state:

/builds/02465material/02465public/py312/lib/python3.12/site-packages/pygame/pkgdata.py:25: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import resource_stream, resource_exists
__init__(env)[source]#

Initialize the feature encoder. It requires an environment to know the number of actions and dimension of the state space.

Parameters:

env – An openai Gym Env.

property d#

Get the number of dimensions of \(w\)

/builds/02465material/02465public/py312/lib/python3.12/site-packages/pygame/pkgdata.py:25: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import resource_stream, resource_exists
x(s, a)[source]#

Computes the \(d\)-dimensional feature vector \(x(s,a)\)

/builds/02465material/02465public/py312/lib/python3.12/site-packages/pygame/pkgdata.py:25: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import resource_stream, resource_exists
Parameters:
  • s – A state \(s\)

  • a – An action \(a\)

Returns:

Feature vector \(x(s,a)\)

get_Qs(state, info_s=None)[source]#

This is a helper function, it is only for internal use.

Parameters:
  • state

  • info_s

Returns:

get_optimal_action(state, info=None)[source]#

For a given state state, this function returns the optimal action for that state.

\[a^* = \arg\max_a Q(s,a)\]

An example:

/builds/02465material/02465public/py312/lib/python3.12/site-packages/pygame/pkgdata.py:25: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import resource_stream, resource_exists
Parameters:
  • state – State to find the optimal action in \(s\)

  • info – The info-dictionary corresponding to this state

Returns:

The optimal action according to the Q-values \(a^*\)

class irlc.ex11.feature_encoder.LinearQEncoder(env, tilings=8, max_size=2048)[source]#

Bases: FeatureEncoder

__init__(env, tilings=8, max_size=2048)[source]#

Implements the tile-encoder described by (SB18)

Parameters:
  • env – The openai Gym environment we wish to solve.

  • tilings – Number of tilings (translations). Typically 8.

  • max_size – Maximum number of dimensions.

x(s, a)[source]#

Computes the \(d\)-dimensional feature vector \(x(s,a)\)

/builds/02465material/02465public/py312/lib/python3.12/site-packages/pygame/pkgdata.py:25: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import resource_stream, resource_exists
Parameters:
  • s – A state \(s\)

  • a – An action \(a\)

Returns:

Feature vector \(x(s,a)\)

property d#

Get the number of dimensions of \(w\)

/builds/02465material/02465public/py312/lib/python3.12/site-packages/pygame/pkgdata.py:25: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import resource_stream, resource_exists

Solutions to selected exercises#