{% import macros as m with context  %}
{% set week = 4 %}
{{ m.exercise_head(week) }}


Last week we saw how we could specify a basic control model using the :class:`~irlc.ex03.control_model.ControlModel` class.
This model implements a continuous-time problem of the form

.. math::
    \mathbf{\dot{x}} = \mathbf{f}(\mathbf{x}, \mathbf{u}, t)

In this week, we will consider discretization, which is the process of creating a new model with discrete dynamics of the form:

.. math::

    \mathbf{x}_{k+1} = f_k(\mathbf{x}_k, \mathbf{u}_k) = \mathbf{x}_{k} + \Delta  \mathbf{f}(\mathbf{x}_k, \mathbf{u}_k, t_k)

Discretization is an automatic procecure, which means that you only need to understand this part of the code from a user-perspective and then you can discretize any continuous-time model automatically.

To give an overview, we end up with three classes:

The :class:`~irlc.ex03.control_model.ControlModel` class
    This is the main class. It implements and represents everything that has to do with the continuous-time control problem. This includes:

    - The differential equation

      .. math::
            \frac{ dx(t)}{dt} = f(x(t), u(t), t)

      which you implement as  :meth:`~irlc.ex03.control_model.ControlModel.sym_f`
    - The cost-function

      .. math::
        g(x(t), u(t), t) = \frac{1}{2} x(t)^\top Q x(t)  + \frac{1}{2} u(t)^\top R u(t) + \cdots

      which you implement as :meth:`~irlc.ex03.control_model.ControlModel.get_cost`
The :class:`~irlc.ex04.discrete_control_model.DiscreteControlModel`
    This class represents a discretized version of a continuous control problem. You call it with a
    :class:`~irlc.ex03.control_model.ControlModel` object and a discretization time :math:`\Delta`, which it will then
    discretize for you. It will give you:

    - Specifying the (Euler) discretized dynamics: :math:`x_{k+1} = f_k(x_k, u_k)` (as :meth:`~irlc.ex04.discrete_control_model.DiscreteControlModel.f`)
    - Ability to evaluate the discretized cost function.
    - Ability to compute gradients of the dynamics using (as :meth:`~irlc.ex04.discrete_control_model.DiscreteControlModel.f_jacobian`)
The :class:`~irlc.ex04.control_environment.ControlEnvironment`
    This class represents a control environment -- i.e. this is the class you pass on to the :func:`~irlc.ex01.agent.train`-method, and it is
    an instance of the base environment class we saw in week 1-3, :class:`gymnasium.Env`.

    To create an environment, you simply give it a :class:`~irlc.ex04.discrete_control_model.DiscreteControlModel` object,
    and a maximum time until the environment terminates :python:`Tmax`.

    - It has a  :meth:`~irlc.ex04.control_environment.ControlEnvironment.step` and  :meth:`~irlc.ex04.control_environment.ControlEnvironment.reset` function
      as any other environment.

The bottom line of the above is that your main work will be in specifying the continuous time model, and then the discrete model and environment
can be derived automatically in potentially very few lines.

Example: Creating a discrete pendulum environment
*******************************************************************************
As an example, lets create a discrete version of the basic pendulum environment we saw in last week.

Starting point
    Our starting point is the
    :class:`~irlc.ex03.control_model.ControlModel` object we saw last week:

    .. literalinclude:: ../../shared/output/basic_pendulum_a.py
Discretize the control model
    The following two lines will discretize the pendulum model and then use the :python:`print` statement to display the discretized dynamics:

    .. runblock:: pycon

        >>> from irlc.ex04.discrete_control_model import DiscreteControlModel
        >>> from irlc.ex03.basic_pendulum import BasicPendulumModel
        >>> model = BasicPendulumModel()
        >>> discrete_pendulum = DiscreteControlModel(model, dt=0.5)  # Using a discretization time step: 0.5 seconds.
        >>> print(discrete_pendulum)
Computing the discrete dynamics
    To compute the discrete dynamics, use :meth:`~irlc.ex04.discrete_control_model.DiscreteControlModel.f`. As an example:

    .. runblock:: pycon

       >>> from irlc.ex04.discrete_control_model import DiscreteControlModel
       >>> from irlc.ex03.basic_pendulum import BasicPendulumModel
       >>> model = BasicPendulumModel()
       >>> discrete_pendulum = DiscreteControlModel(model, dt=0.5)  # Using a discretization time step: 0.5 seconds.
       >>> x0 = model.x0_bound().low # Get the initial state: x0 = [np.pi, 0].
       >>> u0 = [0.1] # Action of u=0.1. Note that the action must be a list.
       >>> x1 = discrete_pendulum.f(x0, u0)
       >>> print(x1)
       >>> print("Now, lets compute the Euler step manually to confirm")
       >>> x1_manual = x0 + 0.5 * model.f(x0, u0)
       >>> print(x1_manual)

    The default discretization procedure is Euler discretization which the two last lines verify.


Pendulum continued: Coordinate transformations
***********************************************************************
It is common to apply a variable transformations to angles :math:`\theta` when the environment is discretized.
As an example, we will consider the discrete model: :class:`~irlc.ex04.model_pendulum.SinCosPendulumModel`.
Recall that this environment has undergone a variable transformation so that the **discretized coordinates** :math:`x_k` are related to the continuous-time coordinates as (:cite:t:`herlau`, Subsection 13.1.4):

.. math::
    x_k = \phi_x(x(t_k))

The state and action spaces are defined as before, but the state space has an extra dimension to account for the change of variables:

.. math::
    \begin{bmatrix} \theta \\ \dot \theta \end{bmatrix} & \mapsto \begin{bmatrix} \sin(\theta) \\ \cos(\theta) \\ \dot \theta \end{bmatrix} \\\\
    \begin{bmatrix} u \end{bmatrix} & \mapsto \begin{bmatrix} \tanh^{-1} \frac{u}{U} \end{bmatrix}.

The following example illustrates how we can manually map the states and actions from the initial state through the variable transformations.

.. literalinclude:: ../../shared/output/model_pendulum_vartransform.py

The variable transformation for the state, :math:`x_k = \phi_x(x(t_k))`, is specified as the function
:meth:`~irlc.ex03.control_model.ControlModel.phi_x`. It accepts a list of symbolic
variables as input, and should return a list of symbolic variables. Note you also have to specify the inverse transformation
as :meth:`~irlc.ex03.control_model.ControlModel.phi_x_inv`.

When these are specified, you can transform variables in the discretized model as follows:

.. runblock:: pycon

    >>> from irlc.ex04.model_pendulum import DiscreteSinCosPendulumModel
    >>> dmodel = DiscreteSinCosPendulumModel()
    >>> x0 = dmodel.continuous_model.x0_bound().low
    >>> x0_discrete = dmodel.phi_x(x0)
    >>> print(x0, "->", x0_discrete)


The dynamics :math:`f_k(x_k, u_k)` can be evaluated as:

.. runblock:: pycon

    >>> from irlc.ex04.model_pendulum import DiscreteSinCosPendulumModel
    >>> import numpy as np
    >>> model = DiscreteSinCosPendulumModel()
    >>> theta = np.pi/3
    >>> x = [np.sin(theta), np.cos(theta), 0.5] # [cos(theta), sin(theta), d theta/dt]
    >>> u = [1] # Get an arbitrary action
    >>> print(model.f(x,u) )           # Computes x_{k+1} = f_k(x_k, u_k)

Shortcuts
*******************************************************************************
.. tip::
    :class: margin

    Defining these shortcuts are for convenience in the exercises. If you wish to create a model during an exam or as part of the project,
    I recommend following the discretization procedure outlined above as I consider it to be simpler and less prone to errors.

To avoid repeating the same lines many times, I have included shortcuts for discretized versions of the most common models.
for instance the pendulum model. As an example, this is a shortcut to create an instance of the pendulum environment with the :math:`\sin(\theta)`, :math:`\cos(\theta)` coordinate transformations (hence the name)

.. runblock:: pycon

    >>> from irlc.ex04.model_pendulum import DiscreteSinCosPendulumModel
    >>> model = DiscreteSinCosPendulumModel()
    >>> print(model)

You can check the code to see how the shortcuts are defined.

Computing derivatives (Jacobians) of the discretized dynamics:
******************************************************************

.. tip::
    :class: margin

    While the Cartpole and Pendulum environments require coordinate transformation, you won't have to implement them in this course.

Computing derivatives of the dynamics can be done by using the function
:meth:`~irlc.ex04.discrete_control_model.DiscreteControlModel.f_jacobian`.

The function will then return two numpy :class:`~numpy.ndarary` objects corresponding to the Jacobian derives with respect to :math:`x` and :math:`u`:

.. math::
    J_x f_k(x,u), \quad J_u f_k(x, u)

.. runblock:: pycon

    >>> from irlc.ex04.model_pendulum import DiscreteSinCosPendulumModel
    >>> import numpy as np
    >>> model = DiscreteSinCosPendulumModel()
    >>> theta = np.pi/3
    >>> x = [np.sin(theta), np.cos(theta), 0.5] # [sin(theta), cos(theta), d theta/dt]
    >>> u = [1] # Get an arbitrary action
    >>> Jx, Ju = model.f_jacobian(x,u)
    >>> print("Jacobian Jx = df/dx is\n", Jx)
    >>> print("Jacobian Ju = df/du is\n", Ju)

The discrete cost
*********************************************************
The cost is automatically discretized and stored as the attribute :attr:`irlc.ex04.discrete_model.DiscreteModel.cost`. You can use it as follows:

.. runblock:: pycon

    >>> from irlc.ex04.model_pendulum import DiscreteSinCosPendulumModel
    >>> model = DiscreteSinCosPendulumModel()
    >>> model.cost.Q        # Get access to a matrix in the the cost function
    >>> print(model.cost)   # Print the cost function in a nicer format.


The control environment
------------------------------------------------------------------------------------
A control environment behaves just like a regular gymnasium :class:`~gymnasium.Env`. Here is an example interaction:

.. runblock:: pycon

    >>> from irlc.ex04.model_pendulum import GymSinCosPendulumEnvironment
    >>> env = GymSinCosPendulumEnvironment()
    >>> s0, _ = env.reset() # Reset both the time and state variable
    >>> print("Initial state", s0)
    >>> u = [1]     # Just example.
    >>> next_state, cost, done, truncated, info = env.step(u)
    >>> print("Current state: ", next_state)
    >>> print("Current time", env.time)

Defining a control environment
***************************************************************************************
Transforming a discrete model to an environment is accomplished using the class
:class:`~irlc.ex04.control_environment.ControlEnvironment`. The following example illustrate the full procedure:

.. runblock:: pycon

    >>> from irlc.ex04.discrete_control_model import DiscreteControlModel
    >>> from irlc.ex03.basic_pendulum import BasicPendulumModel
    >>> from irlc.ex04.control_environment import ControlEnvironment
    >>> model = BasicPendulumModel()
    >>> discrete_pendulum = DiscreteControlModel(model, dt=0.5)  # Using a discretization time step: 0.5 seconds.
    >>> env = ControlEnvironment(discrete_pendulum, Tmax=10)
    >>> s0 = env.reset()
    >>> s0

In this code, :python:`Tmax` control the maximum time for the episode (measured in seconds)

.. tip::
    In control, we often want to know what the actual (continuous) walltime :math:`t` is. You can access that as :python:`info['time_seconds']` (see below)

Control agents
-------------------------------------------------------------------------------
The agent interface does not change. The following example show you can you implement a very basic agent and train it
for a single episode. If you are doing model-based control, you probably want to pass a :python:`model` to the agent which can be either
a :class:`~irlc.ex04.discrete_control_model.DiscreteControlModel` or a :class:`~irlc.ex03.control_model.ControlModel`.

.. runblock:: pycon

    >>> from irlc import Agent, train
    >>> class RandomAgent(Agent):
    ...     def __init__(self, env, model=None): # do initialization stuff here.
    ...         # You can use the model to plan. This is probably a good place to do that.
    ...         super().__init__(env)
    ...     def pi(self, x, k, info=None):
    ...         print(f"At {k=} the time is {info['time_seconds']=} and {x=}")
    ...         # compute some sort of action here, probably using the model.
    ...         return self.env.action_space.sample()
    >>>
    >>> from irlc.ex04.model_pendulum import GymSinCosPendulumEnvironment
    >>> env = GymSinCosPendulumEnvironment(Tmax=0.1)
    >>> stats, _ = train(env, RandomAgent(env), verbose=False)
    >>> print("Total reward (-cost)", stats[0]['Accumulated Reward'])


Visualizing trajectories
-------------------------------------------------------------------------------------------
It is easy to visualize trajectories (i.e., the states :math:`x_k` and actions :math:`u_k`) during an episode. The following
example shows are very bare-boned approach using the :class:`~irlc.ex01.agent.Agent`:

.. plot::
    :include-source:

    import matplotlib.pyplot as plt
    import numpy as np
    from irlc import Agent, train
    from irlc.ex04.model_pendulum import GymSinCosPendulumEnvironment
    env = GymSinCosPendulumEnvironment()

    stats, trajectories = train(env, Agent(env), num_episodes=1, return_trajectory=True)

    plt.plot(trajectories[0].time, trajectories[0].state[:,0], label="$\\sin(\\theta)$")
    plt.plot(trajectories[0].time, trajectories[0].state[:,1], label="$\\cos(\\theta)$")


We can simplify this, and get a slightly nicer plot in the process, by using the :func:`~irlc.utils.irlc_plot.plot_trajectory` function. It will plot all the coordinates,
and you can set the labels in the environment class. The plot shows that the random actions will tend to cancel out, and the system moves very little.


.. plot::
    :include-source:

    import matplotlib.pyplot as plt
    import numpy as np
    from irlc import Agent, plot_trajectory, train
    from irlc.ex04.model_pendulum import GymSinCosPendulumEnvironment
    env = GymSinCosPendulumEnvironment()
    stats, trajectories = train(env, Agent(env), num_episodes=1, return_trajectory=True)

    plot_trajectory(trajectories[0], env)

.. tip::
    The :func:`~irlc.utils.irlc_plot.plot_trajectory` function extracts the labels from the environment which gets them from the model. You can see how they are defined in
    :class:`~irlc.ex04.model_pendulum.SinCosPendulumModel`.

    If you only want to plot a subset of the state or action coordinates you can specify this as extra arguments.


Classes and functions
-------------------------------------------------------------------------

.. autoclass:: irlc.ex04.discrete_control_model.DiscreteControlModel
  :members:

.. autoclass:: irlc.ex04.control_environment.ControlEnvironment
  :members:

.. autoclass:: irlc.ex04.discrete_control_cost.DiscreteQRCost
  :members:

.. autofunction:: irlc.utils.irlc_plot.plot_trajectory


Solutions to selected exercises
-------------------------------------------------------------------------------------------------------


{% if show_solution[week] %}

.. admonition:: Solution to problem 1
    :class: dropdown

    **Part a:** At time :math:`\Delta` the angle is :math:`w(\Delta) = 0` because it is the first coordinate of :math:`\mathbf x_1`:

    .. math::

        \mathbf x_1 = \mathbf x_0 + \Delta f(\mathbf x_0, 0) = \begin{bmatrix} 0 \\ \Delta \end{bmatrix}


    **Part b:**  We just do one more Euler update to get:

    .. math::

        \mathbf x_2 = \mathbf x_1 + \Delta f(\mathbf x_1, 0) = \Delta \begin{bmatrix} \Delta \\ 1 + \cos( 0 ) \end{bmatrix}

    So the answer is :math:`w(2\Delta) = \Delta^2`.

    **Part c:**  Similar to the **part a** we get:

    .. math::

        \mathbf x_1 & = \mathbf x_0 + \Delta f(\mathbf x_0, u_0) \\
                    & = \begin{bmatrix}  \frac{\pi}{2}  \\ 0 \end{bmatrix} + \Delta \begin{bmatrix} 0 \\ \cos(u_0 + \frac{\pi}{2} )  \end{bmatrix} \\
                    & = \begin{bmatrix} \frac{\pi}{2} \\ -\Delta \sin(u_0) \end{bmatrix}

    So the answer is :math:`w(\Delta) = \frac{\pi}{2}`. This means that regardless of how much force we apply to the system, Euler discretization
    predict the system does not change position during the first :math:`\Delta` time interval -- which is obviously not true. The bottom line is
    that Euler discretization is not accurate unless :math:`\Delta` is much smaller than the time interval we wish to simulate the system over.

{% endif %}


{% if show_solution[week] %}

.. admonition:: Solution to problem 2: PID
    :class: dropdown

    The answer is **c**.

    At the very first time step, the angle is :math:`x = 6`, the control output is :math:`u=-6`, and the target is :math:`x^* = 4`.   This means that the control output satisfy: :math:`u = K_p (x^* - x) = -2K_p`. Solving we get :math:`K_p = 3`.


{% endif %}


{{ m.embed('https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=0eaf7cc0-0eed-4c39-be41-b11c01731b98', 'Problem 4.1 and 4.2: Discretization', True) }}

{{ m.embed('https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=7bc3b6a8-7f00-4790-9518-b11e00a9db0e', 'Problem 4.3: PID', True) }}

{{ m.embed('https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=1ed06f41-c34d-49eb-8878-b11e00a9db0e', 'Problem 4.4: PID Agent', True) }}

{{ m.embed('https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=9734c5c6-3949-4c3d-86fc-b11e00a9db0e', 'Problem 4.5 and 4.6: Pendulum balancer', True) }}