{% import macros as m with context  %}
{% set week = 7 %}
{{ m.exercise_head(week) }}

This week is concerned with linarization, that is, how to (approximately!) turn a non-linear discete dynamical system of the form:

.. math::

    \mathbf x_{k+1}  = \mathbf f_k(\mathbf x_k, \mathbf u_k)

into a linear dynamical system suitable for LQR:

.. math::

    \mathbf x_{k+1} = A \mathbf x_k + B \mathbf u_k + \mathbf d_k.

The procedure described in the notes assume we are interested in states and actions that are close to an
expansion point :math:`\bar{ \mathbf{x} }_k` and :math:`\bar{ \mathbf{u} }_k`
and then approximate the dynamics using a Taylor expansion, see (:cite:t:`herlau`, Section 17.1).

.. math::

    \mathbf f_k(\mathbf x_k, \mathbf u_k) \approx \mathbf f_k(\bar {\mathbf{x} }, \bar {\mathbf{u} }) + \underbrace{\frac{\partial {\mathbf f}_k}{\partial \mathbf x} (\bar {\mathbf x}, \bar {\mathbf u})}_{=A} ( {\mathbf x}_k - \bar {\mathbf x} ) + \underbrace{\frac{\partial \mathbf f_k}{\partial \mathbf u} (\bar {\mathbf x}, \bar {\mathbf u} )}_{=B} (\mathbf u_k - \bar {\mathbf u})

Thus we can re-arrange this expression to yield:

.. math::

    \mathbf x_{k+1} & = A \mathbf x_k + B \mathbf u_k + \mathbf d_k \\
            \mathbf d_k & =  \mathbf f_k(\bar {\mathbf x}, \bar {\mathbf u}) - A \bar {\mathbf x}  - B \bar {\mathbf u}

To implement this, we need to be able to compute the terms above. As an example, we will use the Pendulum-model we have seen
several times during the course. In our example we will assume that :math:`\bar{\mathbf{x} } = \begin{bmatrix} 0 \\ 1 \\ 0.5\end{bmatrix}`
and :math:`\bar{\mathbf{u}} = \begin{bmatrix} 0 \end{bmatrix}`

We can compute :math:`f_k(\bar{\mathbf x}_{k}, \bar{\mathbf u}_{k})` as follows:

.. runblock:: pycon

    >>> from irlc.ex04.model_pendulum import DiscreteSinCosPendulumModel
    >>> import numpy as np
    >>> model = DiscreteSinCosPendulumModel()
    >>> x_bar = [0, 1, 0.5] # [cos(theta), sin(theta), d theta/dt]
    >>> u_bar = [1] # Get an arbitrary action
    >>> print("Compute f_k(xbar, ubar):", model.f(x_bar,u_bar))

Meanwhile the two Jacobians can be computed as follows:

.. runblock:: pycon

    >>> from irlc.ex04.model_pendulum import DiscreteSinCosPendulumModel
    >>> import numpy as np
    >>> model = DiscreteSinCosPendulumModel()
    >>> x_bar = [0, 1, 0.5] # [cos(theta), sin(theta), d theta/dt]
    >>> u_bar = [1] # Get an arbitrary action
    >>> A, B = model.f_jacobian(x_bar,u_bar)
    >>> print(A)
    >>> print(B)

See also :ref:`ex4`.

Iterative LQR
-------------------------------------------------------------------------------------------------------
The iterative LQR exercise closely follow the pseudo code in ... In the following is an overview of the
various functions that are provided as part of the exercise:

The cost and dynamic approximation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In iterative LQR the cost-function will be approximated using a second-order Taylor expansion.
In other words, we assume a cost-function of the standard form:

.. math::

    c_N(\mathbf x_N) + \sum_{k=0}^{N-1} c_k(\mathbf x_k, \mathbf u_k)

.. note::
    :class: margin

    See (:cite:t:`herlau`, Subequation 17.8) for a definition of these expressions.

This cost-function can then be taylor expanded around expansion points :math:`\bar {\mathbf x}_k, \bar {\mathbf u}_k`
to get a quadratic approximation of the form:


.. math::
    c_k(\delta \mathbf x_k, \delta \mathbf u_k) & = \frac{1}{2} \delta \mathbf x_k^\top c_{\mathbf x\mathbf x,k} \delta \mathbf x_k
    + c_{\mathbf x,k}^\top \delta \mathbf x_k
    + \frac{1}{2} \delta \mathbf u_k^\top c_{\mathbf u\mathbf u,k} \delta \mathbf u_k
    + c_{\mathbf u,k}^\top \delta \mathbf u_k
    + \delta \mathbf u_k^\top c_{\mathbf u\mathbf x,k} \delta \mathbf x_k
    + c_k \\
    c_N(\delta \mathbf x_N) & = \frac{1}{2} \delta \mathbf x_N^\top c_{\mathbf x\mathbf x,N} \delta \mathbf x_N
        + c_{\mathbf x,N}^\top \delta \mathbf x_N
        + c_k

This should be compared to the form of the cost-function the :func:`~irlc.ex06.dlqr.LQR` algorithm assumes which is:

.. math::

     \frac{1}{2}\mathbf x_N^\top Q_N \mathbf{x}_N + \frac{1}{2}\mathbf q_N^\top \mathbf{x}_N + q_{N} +  \sum_{k=0}^{N-1} \left[ \frac{1}{2}  \mathbf{x}_k^\top Q_k  \mathbf{x}_k + \frac{1}{2} \mathbf{u}_k^\top R_k \mathbf{u}_k + \cdots \right]

So we can identify these forms by assuming that e.g. :math:`Q_N \equiv c_{\mathbf x \mathbf x, N}` and so on. Within the code
all of these cost terms will be represented as follows:

- :python:`c` = :math:`\begin{bmatrix} c_0, c_1, \dots, c_{N-1} c_N\end{bmatrix} = \begin{bmatrix} q_0 & q_1 & \dots & q_N \end{bmatrix}`
- :python:`c_x` = :math:`\begin{bmatrix} c_{\mathbf x, 0} & c_{\mathbf x, 1} & \dots & c_{\mathbb x, N-1} & c_{\mathbf x, N} \end{bmatrix}`
- :python:`c_xx` = :math:`\begin{bmatrix} c_{ {\mathbf xx}, 0} & c_{ {\mathbf xx}, 1} & \dots & c_{ {\mathbf xx}, N-1} & c_{ {\mathbf xx}, N} \end{bmatrix}`

Meanwhile, the dynamics is of the form :math:`\mathbf x_{k+1} = A_k \mathbf x_k + B_k \mathbf u_k` where:

- :python:`A` = :math:`\begin{bmatrix} A_0 & A_1 & \dots &  A_{N-1} \end{bmatrix}`
- :python:`B` = :math:`\begin{bmatrix} B_0 & B_1 & \dots &  B_{N-1} \end{bmatrix}`


Classes and functions
----------------------------------------------------------------------------------

.. autofunction:: irlc.ex07.ilqr.backward_pass

.. autofunction:: irlc.ex07.ilqr.cost_of_trajectory

.. autofunction:: irlc.ex07.ilqr.get_derivatives

.. autofunction:: irlc.ex07.ilqr.forward_pass


Solutions to selected exercises
-------------------------------------------------------------------------------------------------------

{% if show_solution[week] %}

.. admonition:: Solution to problem 1
    :class: dropdown

    **Part a:** The dynamics is:

    .. math::

        x_{k+1} = x_k + \Delta f(x_k, u_k) = x_k(1 + \Delta 4u_k)  = f_k(x_k, u_k)

    **Part b:** The derivative give us

    .. math::

        A & = \frac{\partial f}{\partial x} = 1 + 4\Delta  = 3\\
        B & = \frac{\partial f}{\partial u} = 4\Delta \bar x = 2\bar x

    Finally we compute :math:`d` to be:

    .. math::

        d = f_k(\bar x, \bar u) - A \bar x - B \bar u = \bar x(1+4\Delta) - 3 \bar x - 2\bar x = -2 \bar x

    So therefore:

    .. math::

        x_{k+1} \approx 3x_k  + 2\bar x u_k - 2\bar x

    Note that we just applied the procedure outlined at the top of this page.
{% endif %}

{{ m.embed('https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=72924f5b-6617-4f9b-8362-b1330169acbc', 'Problem 7.1', True) }}