Exercise 7: Linearization and iterative LQR#

Note

  • The exercises material is divided into general information (found on this page) and the actual exercise instructions. You can download this weeks exercise instructions from here:

  • You are encouraged to prepare the homework problems 1 (indicated by a hand in the PDF file) at home and present your solution during the exercise session.

  • To get the newest version of the course material, please see Making sure your files are up to date

This week is concerned with linarization, that is, how to (approximately!) turn a non-linear discete dynamical system of the form:

\[\mathbf x_{k+1} = \mathbf f_k(\mathbf x_k, \mathbf u_k)\]

into a linear dynamical system suitable for LQR:

\[\mathbf x_{k+1} = A \mathbf x_k + B \mathbf u_k + \mathbf d_k.\]

The procedure described in the notes assume we are interested in states and actions that are close to an expansion point \(\bar{ \mathbf{x} }_k\) and \(\bar{ \mathbf{u} }_k\) and then approximate the dynamics using a Taylor expansion, see (Herlau [Her24], Section 17.1).

\[\mathbf f_k(\mathbf x_k, \mathbf u_k) \approx \mathbf f_k(\bar {\mathbf{x} }, \bar {\mathbf{u} }) + \underbrace{\frac{\partial {\mathbf f}_k}{\partial \mathbf x} (\bar {\mathbf x}, \bar {\mathbf u})}_{=A} ( {\mathbf x}_k - \bar {\mathbf x} ) + \underbrace{\frac{\partial \mathbf f_k}{\partial \mathbf u} (\bar {\mathbf x}, \bar {\mathbf u} )}_{=B} (\mathbf u_k - \bar {\mathbf u})\]

Thus we can re-arrange this expression to yield:

\[\begin{split}\mathbf x_{k+1} & = A \mathbf x_k + B \mathbf u_k + \mathbf d_k \\ \mathbf d_k & = \mathbf f_k(\bar {\mathbf x}, \bar {\mathbf u}) - A \bar {\mathbf x} - B \bar {\mathbf u}\end{split}\]

To implement this, we need to be able to compute the terms above. As an example, we will use the Pendulum-model we have seen several times during the course. In our example we will assume that \(\bar{\mathbf{x} } = \begin{bmatrix} 0 \\ 1 \\ 0.5\end{bmatrix}\) and \(\bar{\mathbf{u}} = \begin{bmatrix} 0 \end{bmatrix}\)

We can compute \(f_k(\bar{\mathbf x}_{k}, \bar{\mathbf u}_{k})\) as follows:

>>> from irlc.ex04.model_pendulum import DiscreteSinCosPendulumModel
>>> import numpy as np
>>> model = DiscreteSinCosPendulumModel()
>>> x_bar = [0, 1, 0.5] # [cos(theta), sin(theta), d theta/dt]
>>> u_bar = [1] # Get an arbitrary action
>>> print("Compute f_k(xbar, ubar):", model.f(x_bar,u_bar))
Compute f_k(xbar, ubar): [0.00999983 0.99995    0.61423912]

Meanwhile the two Jacobians can be computed as follows:

>>> from irlc.ex04.model_pendulum import DiscreteSinCosPendulumModel
>>> import numpy as np
>>> model = DiscreteSinCosPendulumModel()
>>> x_bar = [0, 1, 0.5] # [cos(theta), sin(theta), d theta/dt]
>>> u_bar = [1] # Get an arbitrary action
>>> A, B = model.f_jacobian(x_bar,u_bar)
>>> print(A)
[[ 9.99950000e-01  0.00000000e+00  1.99990000e-02]
 [-9.99983333e-03  0.00000000e+00 -1.99996667e-04]
 [ 1.96400000e-01 -0.00000000e+00  1.00000000e+00]]
>>> print(B)
[[0.        ]
 [0.        ]
 [0.06299615]]

See also Exercise 4: Discretization and PID control.

Iterative LQR#

The iterative LQR exercise closely follow the pseudo code in … In the following is an overview of the various functions that are provided as part of the exercise:

The cost and dynamic approximation#

In iterative LQR the cost-function will be approximated using a second-order Taylor expansion. In other words, we assume a cost-function of the standard form:

\[c_N(\mathbf x_N) + \sum_{k=0}^{N-1} c_k(\mathbf x_k, \mathbf u_k)\]

Note

See (Herlau [Her24], Subequation 17.8) for a definition of these expressions.

This cost-function can then be taylor expanded around expansion points \(\bar {\mathbf x}_k, \bar {\mathbf u}_k\) to get a quadratic approximation of the form:

\[\begin{split}c_k(\delta \mathbf x_k, \delta \mathbf u_k) & = \frac{1}{2} \delta \mathbf x_k^\top c_{\mathbf x\mathbf x,k} \delta \mathbf x_k + c_{\mathbf x,k}^\top \delta \mathbf x_k + \frac{1}{2} \delta \mathbf u_k^\top c_{\mathbf u\mathbf u,k} \delta \mathbf u_k + c_{\mathbf u,k}^\top \delta \mathbf u_k + \delta \mathbf u_k^\top c_{\mathbf u\mathbf x,k} \delta \mathbf x_k + c_k \\ c_N(\delta \mathbf x_N) & = \frac{1}{2} \delta \mathbf x_N^\top c_{\mathbf x\mathbf x,N} \delta \mathbf x_N + c_{\mathbf x,N}^\top \delta \mathbf x_N + c_k\end{split}\]

This should be compared to the form of the cost-function the LQR() algorithm assumes which is:

\[\frac{1}{2}\mathbf x_N^\top Q_N \mathbf{x}_N + \frac{1}{2}\mathbf q_N^\top \mathbf{x}_N + q_{N} + \sum_{k=0}^{N-1} \left[ \frac{1}{2} \mathbf{x}_k^\top Q_k \mathbf{x}_k + \frac{1}{2} \mathbf{u}_k^\top R_k \mathbf{u}_k + \cdots \right]\]

So we can identify these forms by assuming that e.g. \(Q_N \equiv c_{\mathbf x \mathbf x, N}\) and so on. Within the code all of these cost terms will be represented as follows:

  • c = \(\begin{bmatrix} c_0, c_1, \dots, c_{N-1} c_N\end{bmatrix} = \begin{bmatrix} q_0 & q_1 & \dots & q_N \end{bmatrix}\)

  • c_x = \(\begin{bmatrix} c_{\mathbf x, 0} & c_{\mathbf x, 1} & \dots & c_{\mathbb x, N-1} & c_{\mathbf x, N} \end{bmatrix}\)

  • c_xx = \(\begin{bmatrix} c_{ {\mathbf xx}, 0} & c_{ {\mathbf xx}, 1} & \dots & c_{ {\mathbf xx}, N-1} & c_{ {\mathbf xx}, N} \end{bmatrix}\)

Meanwhile, the dynamics is of the form \(\mathbf x_{k+1} = A_k \mathbf x_k + B_k \mathbf u_k\) where:

  • A = \(\begin{bmatrix} A_0 & A_1 & \dots & A_{N-1} \end{bmatrix}\)

  • B = \(\begin{bmatrix} B_0 & B_1 & \dots & B_{N-1} \end{bmatrix}\)

Classes and functions#

irlc.ex07.ilqr.backward_pass(A, B, c_x, c_u, c_xx, c_ux, c_uu, mu=1)[source]#

Given all derivatives, apply the LQR algorithm to get the control law.

The input arguments are described in the online documentation and the lecture notes. You should use them to call your implementation of the LQR() method. Note that you should give a value of all inputs except for the d-term.

Parameters:
  • A (list) – linearization of the dynamics matrices \(A_k\).

  • B (list) – linearization of the dynamics matrices \(B_k\).

  • c_x (list) – Cost terms corresponding to \(\mathbf{q}_k\)

  • c_u (list) – Cost terms corresponding to \(\mathbf{r}_k\)

  • c_xx (list) – Cost terms corresponding to \(Q_k\)

  • c_ux (list) – Cost terms corresponding to \(H_k\)

  • c_uu (list) – Cost terms corresponding to \(R_k\)

  • mu – Regularization parameter for the LQR method

Returns:

The control law \(L_k, \mathbf{l}_k\) as two lists.

irlc.ex07.ilqr.cost_of_trajectory(model, xs, us)[source]#

Helper function which computes the cost of the trajectory.

The cost is defined as:

\[c_N( \bar {\mathbf x}_N, \bar {\mathbf u}_) + \sum_{k=0}^{N-1} c_k(\bar {\mathbf x}_k, \bar {\mathbf u}_k)\]

and to compute it, you should use the two helper methods model.cost.c and model.cost.cN (see c() and cN()).

Parameters:
  • model (DiscreteControlModel) – The control model used to compute the cost.

  • xs (list) – A list of length \(N+1\) of the form \(\begin{bmatrix}\bar {\mathbf x}_0 & \dots & \bar {\mathbf x}_N \end{bmatrix}\)

  • us (list) – A list of length \(N\) of the form \(\begin{bmatrix}\bar {\mathbf x}_0 & \dots & \bar {\mathbf x}_{N-1} \end{bmatrix}\)

Return type:

float

Returns:

The cost as a number.

irlc.ex07.ilqr.get_derivatives(model, x_bar, u_bar)[source]#

Compute all the derivatives used in the model.

The return type should match the meaning in (Her24, Subequation 17.8) and in the online documentation.

  • c should be a list of length \(N+1\)

  • c_x should be a list of length \(N+1\)

  • c_xx should be a list of length \(N+1\)

  • c_u should be a list of length \(N\)

  • c_uu should be a list of length \(N\)

  • c_ux should be a list of length \(N\)

  • A should be a list of length \(N\)

  • B should be a list of length \(N\)

Use the model to compute these terms. For instance, this will compute the first terms A[0] and B[0]:

A0, B0 = model.f_jacobian(x_bar[0], u_bar[0], 0)

Meanwhile, to compute the first terms of the cost-functions you should use:

c[0], c_x[0], c_u[0], c_xx[0], c_ux[0], c_uu[0] = model.cost.c(x_bar[0], u_bar[0], k=0, compute_gradients=True)
Parameters:
  • model (DiscreteControlModel) – The model to use when computing the derivatives of the cost

  • x_bar (list) – The nominal state-trajectory

  • u_bar (list) – The nominal action-trajectory

Returns:

Lists of all derivatives computed around the nominal trajectory (see the lecture notes).

irlc.ex07.ilqr.forward_pass(model, x_bar, u_bar, L, l, alpha=1.0)[source]#

Simulates the effect of the controller on the model

We assume the system starts in \(\mathbf{x}_0 = \bar {\mathbf x}_0\), and then simulate the effect of generating actions using the closed-loop policy

\[\mathbf{u}_k = \bar {\mathbf u}_k + \alpha \mathbf{l}_k + L_k (\mathbf{x}_k - \bar { \mathbf x}_k)\]

(see (Her24, eq. (17.16))).

Parameters:
  • model (DiscreteControlModel) – The model used to compute the dynamics.

  • x_bar (list) – A nominal list of states

  • u_bar (list) – A nominal list of actions (not used by the method)

  • L (list) – A list of control matrices \(L_k\)

  • l (list) – A list of control vectors \(\mathbf{l}_k\)

  • alpha – The linesearch parameter.

Returns:

A list of length \(N+1\) of simulated states and a list of length \(N\) of simulated actions.

Solutions to selected exercises#