{% import macros as m with context  %}
{% set week = 6 %}
{{ m.exercise_head(week) }}

This weeks exercise will be about the LQR (Linear-quadratic regulator) algorithm. The starting point of the LQR algorithm
is a problem of the form (:math:`k=0,\dots,N-1`):

.. math::

    \mathbf x_{k+1} & = A_k x_k + B_k u_k + \mathbf d_k. \\
    \text{cost} & =  \frac{1}{2} \mathbf x_N^\top Q_N \mathbf x_N + \frac{1}{2} \mathbf q_N^\top x_N + q_N + \sum_{k=0}^{N-1} c(\mathbf x_k, \mathbf u_k). \\
    c(\mathbf x_k, \mathbf u_k) & = \frac{1}{2} \mathbf x_k^T Q_k \mathbf x_k + \frac{1}{2} \mathbf u_k^T R_k \mathbf u_k + \mathbf u_k^T H_k \mathbf x_k +\mathbf{q}_k^T \mathbf x_k + \mathbf{r}^T_k \mathbf u_k + q_k.


This means there are :math:`N` matrices :math:`A_k` and so on. To practically implement this,
we will assume the matrices are stored as lists: :math:`A_0, A_1, \dots A_{N-1}` = :python:`A = [A0, A1, ...]` of
length :math:`N` so that :python:`A[k]` corresponds to :math:`A_k`. The same goes for all the other terms.

.. hint::
    :class: margin

    Only the terms :python:`A` and :python:`B` are required. The rest of the terms will default to 0-matrices.


The LQR algorithm takes a problem of this form and then ultimately compute a control law:

.. math::

    \mathbf u_k = L_k \mathbf x_k + \mathbf l_k

And a cost-to-go function as:

.. math::

    J_k(\mathbf x_k) = \frac{1}{2} \mathbf x_k^\top V_k \mathbf x_k + \mathbf v_k^\top \mathbf x_k + v_k

Again there are :math:`N-1` terms. The function we implement will therefore end with:
:python:`return (L, l), (V, v, vc)` so that :python:`L[k]` corresponds to :math:`L_k` and :python:`vc[k]` corresponds to :math:`v_k`.


Example: Calling the LQR function
*************************************************************************************************************

Consider a discrete system with dynamics:

.. math::

    \mathbf x_{k+1} = \begin{bmatrix}1 & 1 \\ 0 & 1 \end{bmatrix} \mathbf x_k + \begin{bmatrix} 0 \\ 1 \end{bmatrix} \mathbf u_k

and cost:

.. math::

     \frac{1}{2} \mathbf x_N^\top \begin{bmatrix}1 & 0 \\ 0 & 0 \end{bmatrix} \mathbf x_N + \sum_{k=0}^{N-1} \frac{1}{2} \mathbf x_k^\top \begin{bmatrix}1 & 0 \\ 0 & 0 \end{bmatrix} \mathbf x_k + \frac{1}{2} u_k^2

The following code will run LQR on this problem using a horizon of :math:`N=20` and then compute the first action:

.. math::

    u_0 = L_0 \mathbf x_0 + \mathbf l_0

Using :math:`\mathbf x_0 = \begin{bmatrix} 1 \\ 0\end{bmatrix}`:

.. runblock:: pycon

    >>> import numpy as np
    >>> from irlc.ex06.dlqr import LQR
    >>> N = 20 # Planning horizon.
    >>> A = np.asarray([[1, 1], [0, 1]])
    >>> B = np.asarray([[0], [1]])
    >>> Q = np.asarray([[1, 0], [0, 0]])
    >>> R = np.asarray([[1]])
    >>> (L,l), (V,v,vc) = LQR(A=[A]*N, B=[B]*N, Q=[Q]*N, R=[R]*N, QN=Q)
    >>> x = np.asarray([1, 0])
    >>> u = L[0] @ x + l[0]
    >>> print(f"Action taken at time k=0 in state {x=} is {u=}")

Note that the code uses the syntax :python:`[A]*N` to create a list containing :python:`A` repeated :math:`N` times. An example:

.. runblock:: pycon

    >>> [42]*3

Classes and functions
-------------------------------------------------------------------------

.. autofunction:: irlc.ex06.dlqr.LQR


Solutions to selected exercises
-------------------------------------------------------------------------------------------------------


{% if show_solution[week] %}

.. admonition:: Solution to problem 1
    :class: dropdown

    **Part a:** Let's do the calculation for all values of :math:`x_{N-1}` and :math:`u_{N-1}`. Simply plug in the expression of :math:`g_N` and :math:`f_k`
    and expand the square:

    .. math::

        \mathbb{E}[g_N(x_N) | x_{N-1} = x, u_{N-1} = u] & = \mathbb{E}_w[ (a x + u - \lambda w )^2 ] \\
        & = \mathbb{E}_w[ (ax  + u)^2 - 2(ax+u)\lambda w  + \lambda^2 w^2  ]

    Then use that :math:`w` has mean 0 and variance 1 to get:

    .. math::

        \mathbb{E}[g_N(x_N) | x_{N-1} = x, u_{N-1} = u] = (ax  + u)^2 + \lambda^2

    Pluggin in :math:`\lambda=3, u=1, x=0` gives the result :math:`10`.

    **Part b:**
    According to DP, the optimal policy can be found as :math:`\mu_{N-1}(x_{N-1}) = \arg min_u Q(x_{N-1}, u)` where

    .. math::

        Q(x_{N-1}, u) = \mathbb{EE}[ g_{N-1}(x_{N-1}) + g_N(  x_N) | x_{N-1}, u_{N-1}=u]

    The term involving :math:`g_{N-1}` does not depend on the random noise :math:`w` and so:

    .. math::

        Q(x_{N-1}, u) = u^2 + \mathbb{EE}[g_N(  x_N) | x_{N-1}, u_{N-1}=u] = u^2 + (a x_{N-1}  + u)^2 + \lambda^2

    Minimizing the right-hand side with respect to :math:`u` can be done by setting the derivative equal to zero:

    .. math::

        0 = 2u^* + 2(a x_{N-1} + u^*)

    And therefore we get :math:`\mu_{N-1}(x_{N-1}) = \frac{-a x_{N-1}}{2}`. In other words, the optimal policy is independent of :math:`\lambda`.

    **Comments**: This derivation is a 1-d equivalent of one step of the LQR derivation. The result we found above is similar to the result in the lecture note of how the LQR algorithm is not affected by noise.


    By using the constraint equations we could potentially recover :math:`u_1` and :math:`x_1`, but that is left as an exercise.
{% endif %}


{{ m.embed('https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=c04eca0b-44b4-464f-b16a-b12c016f87ae', 'Problem 6.1-6.3', True) }}

{{ m.embed('https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=e01a42d2-304a-4b10-9859-b12c016f87ae', 'Problem 6.4', True) }}