{% import macros as m with context  %}

.. _mountaincar_game:


Week 11: Mountain-car with linear feature approximators
=============================================================================================================

{{ m.embed_game('week11_mountaincar_sarsa') }}


.. topic:: Controls
    :class: margin

    :kbd:`Space`
        Take a random action
    :kbd:`p`
        Start training

    .. rubric:: Run locally

    :gitref:`../irlc/lectures/lec11/lecture_11_mountaincar_feature_space.py`

What you see
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The example show the Sarsa algorithm with linear function approximators applied to the MountainCar example.
The state :math:`s` is two-dimensional (position, velocity) and the right-hand pane visualize the (estimate) of the value-function :math:`V(s)` for all states :math:`s`
The function approximators use tile-coding which is what gives rise to the grid-like pattern.

How it works
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The Sarsa algorithm approximate the Q-values using :math:`q(s, a, w) = \mathbf{x}(s, a) \mathbf{w}^\top`. In this case the
feature vector :math:`x(s,a)` is a very high-dimensional vector (about 4000 dimensions) constructed using tile-coding. You can find the details
in :cite:`sutton` but to greatly simplify the construction, the state-space is divided into a fairly fine grid, and then the dimension of :math:`x(s,a)` which correspond to each grid-point is set to 1 (and all other are zero)
This is why the updates appear to be local.

For visualization it is not convenient to plot :math:`q(s,a, \mathbf{w} )` because of the actions, so there we plot the corresponding estimate of the value-funciton
:math:`v(s) = \max_a q(s, a, \mathbf{w})`.