{% import macros as m with context %} .. _mountaincar_game: Week 11: Mountain-car with linear feature approximators ============================================================================================================= {{ m.embed_game('week11_mountaincar_sarsa') }} .. topic:: Controls :class: margin :kbd:`Space` Take a random action :kbd:`p` Start training .. rubric:: Run locally :gitref:`../irlc/lectures/lec11/lecture_11_mountaincar_feature_space.py` What you see ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The example show the Sarsa algorithm with linear function approximators applied to the MountainCar example. The state :math:`s` is two-dimensional (position, velocity) and the right-hand pane visualize the (estimate) of the value-function :math:`V(s)` for all states :math:`s` The function approximators use tile-coding which is what gives rise to the grid-like pattern. How it works ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The Sarsa algorithm approximate the Q-values using :math:`q(s, a, w) = \mathbf{x}(s, a) \mathbf{w}^\top`. In this case the feature vector :math:`x(s,a)` is a very high-dimensional vector (about 4000 dimensions) constructed using tile-coding. You can find the details in :cite:`sutton` but to greatly simplify the construction, the state-space is divided into a fairly fine grid, and then the dimension of :math:`x(s,a)` which correspond to each grid-point is set to 1 (and all other are zero) This is why the updates appear to be local. For visualization it is not convenient to plot :math:`q(s,a, \mathbf{w} )` because of the actions, so there we plot the corresponding estimate of the value-funciton :math:`v(s) = \max_a q(s, a, \mathbf{w})`.