Exercise 7: Linearization and iterative LQR

Exercise 7: Linearization and iterative LQR#

Note

This page contains background information which may be useful in future exercises or projects. You can download this weeks exercise instructions from here:
- 02465ex7_Python.pdf
Slides: [1x] ([6x]). Reading: Chapter 17, [Her25].
You are encouraged to prepare the homework problems 1 (indicated by a hand in the PDF file) at home and present your solution during the exercise session.
To get the newest version of the course material, please see Making sure your files are up to date

This week is concerned with linarization, that is, how to (approximately!) turn a non-linear discete dynamical system of the form:

\[\mathbf x_{k+1} = \mathbf f_k(\mathbf x_k, \mathbf u_k)\]

into a linear dynamical system suitable for LQR:

\[\mathbf x_{k+1} = A \mathbf x_k + B \mathbf u_k + \mathbf d_k.\]

The procedure described in the notes assume we are interested in states and actions that are close to an expansion point \(\bar{ \mathbf{x} }_k\) and \(\bar{ \mathbf{u} }_k\) and then approximate the dynamics using a Taylor expansion, see (Herlau [Her25], Section 17.1).

\[\mathbf f_k(\mathbf x_k, \mathbf u_k) \approx \mathbf f_k(\bar {\mathbf{x} }, \bar {\mathbf{u} }) + \underbrace{\frac{\partial {\mathbf f}_k}{\partial \mathbf x} (\bar {\mathbf x}, \bar {\mathbf u})}_{=A} ( {\mathbf x}_k - \bar {\mathbf x} ) + \underbrace{\frac{\partial \mathbf f_k}{\partial \mathbf u} (\bar {\mathbf x}, \bar {\mathbf u} )}_{=B} (\mathbf u_k - \bar {\mathbf u})\]

Thus we can re-arrange this expression to yield:

\[\begin{split}\mathbf x_{k+1} & = A \mathbf x_k + B \mathbf u_k + \mathbf d_k \\ \mathbf d_k & = \mathbf f_k(\bar {\mathbf x}, \bar {\mathbf u}) - A \bar {\mathbf x} - B \bar {\mathbf u}\end{split}\]

To implement this, we need to be able to compute the terms above. As an example, we will use the Pendulum-model we have seen several times during the course. In our example we will assume that \(\bar{\mathbf{x} } = \begin{bmatrix} 0 \\ 1 \\ 0.5\end{bmatrix}\) and \(\bar{\mathbf{u}} = \begin{bmatrix} 0 \end{bmatrix}\)

We can compute \(f_k(\bar{\mathbf x}_{k}, \bar{\mathbf u}_{k})\) as follows:

/builds/02465material/02465public/py312/lib/python3.12/site-packages/pygame/pkgdata.py:25: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import resource_stream, resource_exists

Meanwhile the two Jacobians can be computed as follows:

/builds/02465material/02465public/py312/lib/python3.12/site-packages/pygame/pkgdata.py:25: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import resource_stream, resource_exists

Iterative LQR#

The iterative LQR exercise closely follow the pseudo code in … In the following is an overview of the various functions that are provided as part of the exercise:

The cost and dynamic approximation#

In iterative LQR the cost-function will be approximated using a second-order Taylor expansion. In other words, we assume a cost-function of the standard form:

\[c_N(\mathbf x_N) + \sum_{k=0}^{N-1} c_k(\mathbf x_k, \mathbf u_k)\]

Note

See (Herlau [Her25], Subequation 17.8) for a definition of these expressions.

This cost-function can then be taylor expanded around expansion points \(\bar {\mathbf x}_k, \bar {\mathbf u}_k\) to get a quadratic approximation of the form:

\[\begin{split}c_k(\delta \mathbf x_k, \delta \mathbf u_k) & = \frac{1}{2} \delta \mathbf x_k^\top c_{\mathbf x\mathbf x,k} \delta \mathbf x_k + c_{\mathbf x,k}^\top \delta \mathbf x_k + \frac{1}{2} \delta \mathbf u_k^\top c_{\mathbf u\mathbf u,k} \delta \mathbf u_k + c_{\mathbf u,k}^\top \delta \mathbf u_k + \delta \mathbf u_k^\top c_{\mathbf u\mathbf x,k} \delta \mathbf x_k + c_k \\ c_N(\delta \mathbf x_N) & = \frac{1}{2} \delta \mathbf x_N^\top c_{\mathbf x\mathbf x,N} \delta \mathbf x_N + c_{\mathbf x,N}^\top \delta \mathbf x_N + c_k\end{split}\]

This should be compared to the form of the cost-function the LQR() algorithm assumes which is:

\[\frac{1}{2}\mathbf x_N^\top Q_N \mathbf{x}_N + \frac{1}{2}\mathbf q_N^\top \mathbf{x}_N + q_{N} + \sum_{k=0}^{N-1} \left[ \frac{1}{2} \mathbf{x}_k^\top Q_k \mathbf{x}_k + \frac{1}{2} \mathbf{u}_k^\top R_k \mathbf{u}_k + \cdots \right]\]

So we can identify these forms by assuming that e.g. \(Q_N \equiv c_{\mathbf x \mathbf x, N}\) and so on. Within the code all of these cost terms will be represented as follows:

c = \(\begin{bmatrix} c_0, c_1, \dots, c_{N-1} c_N\end{bmatrix} = \begin{bmatrix} q_0 & q_1 & \dots & q_N \end{bmatrix}\)
c_x = \(\begin{bmatrix} c_{\mathbf x, 0} & c_{\mathbf x, 1} & \dots & c_{\mathbb x, N-1} & c_{\mathbf x, N} \end{bmatrix}\)
c_xx = \(\begin{bmatrix} c_{ {\mathbf xx}, 0} & c_{ {\mathbf xx}, 1} & \dots & c_{ {\mathbf xx}, N-1} & c_{ {\mathbf xx}, N} \end{bmatrix}\)

Meanwhile, the dynamics is of the form \(\mathbf x_{k+1} = A_k \mathbf x_k + B_k \mathbf u_k\) where:

A = \(\begin{bmatrix} A_0 & A_1 & \dots & A_{N-1} \end{bmatrix}\)
B = \(\begin{bmatrix} B_0 & B_1 & \dots & B_{N-1} \end{bmatrix}\)

Classes and functions#

irlc.ex07.ilqr.backward_pass(A, B, c_x, c_u, c_xx, c_ux, c_uu, mu=1)[source]#

Given all derivatives, apply the LQR algorithm to get the control law.

The input arguments are described in the online documentation and the lecture notes. You should use them to call your implementation of the LQR() method. Note that you should give a value of all inputs except for the d-term.

Parameters:

A (list) – linearization of the dynamics matrices \(A_k\).
B (list) – linearization of the dynamics matrices \(B_k\).
c_x (list) – Cost terms corresponding to \(\mathbf{q}_k\)
c_u (list) – Cost terms corresponding to \(\mathbf{r}_k\)
c_xx (list) – Cost terms corresponding to \(Q_k\)
c_ux (list) – Cost terms corresponding to \(H_k\)
c_uu (list) – Cost terms corresponding to \(R_k\)
mu – Regularization parameter for the LQR method

Returns:

The control law \(L_k, \mathbf{l}_k\) as two lists.

irlc.ex07.ilqr.cost_of_trajectory(model, xs, us)[source]#

Helper function which computes the cost of the trajectory.

The cost is defined as:

\[c_N( \bar {\mathbf x}_N, \bar {\mathbf u}_) + \sum_{k=0}^{N-1} c_k(\bar {\mathbf x}_k, \bar {\mathbf u}_k)\]

and to compute it, you should use the two helper methods model.cost.c and model.cost.cN (see c() and cN()).

Parameters:

model (DiscreteControlModel) – The control model used to compute the cost.
xs (list) – A list of length \(N+1\) of the form \(\begin{bmatrix}\bar {\mathbf x}_0 & \dots & \bar {\mathbf x}_N \end{bmatrix}\)
us (list) – A list of length \(N\) of the form \(\begin{bmatrix}\bar {\mathbf x}_0 & \dots & \bar {\mathbf x}_{N-1} \end{bmatrix}\)

Return type:

float

Returns:

The cost as a number.

irlc.ex07.ilqr.get_derivatives(model, x_bar, u_bar)[source]#

Compute all the derivatives used in the model.

The return type should match the meaning in (Her25, Subequation 17.8) and in the online documentation.

c should be a list of length \(N+1\)
c_x should be a list of length \(N+1\)
c_xx should be a list of length \(N+1\)
c_u should be a list of length \(N\)
c_uu should be a list of length \(N\)
c_ux should be a list of length \(N\)
A should be a list of length \(N\)
B should be a list of length \(N\)

Use the model to compute these terms. For instance, this will compute the first terms A[0] and B[0]:

A0, B0 = model.f_jacobian(x_bar[0], u_bar[0], 0)

Meanwhile, to compute the first terms of the cost-functions you should use:

c[0], c_x[0], c_u[0], c_xx[0], c_ux[0], c_uu[0] = model.cost.c(x_bar[0], u_bar[0], k=0, compute_gradients=True)

Parameters:

model (DiscreteControlModel) – The model to use when computing the derivatives of the cost
x_bar (list) – The nominal state-trajectory
u_bar (list) – The nominal action-trajectory

Returns:

Lists of all derivatives computed around the nominal trajectory (see the lecture notes).

irlc.ex07.ilqr.forward_pass(model, x_bar, u_bar, L, l, alpha=1.0)[source]#

Simulates the effect of the controller on the model

We assume the system starts in \(\mathbf{x}_0 = \bar {\mathbf x}_0\), and then simulate the effect of generating actions using the closed-loop policy

\[\mathbf{u}_k = \bar {\mathbf u}_k + \alpha \mathbf{l}_k + L_k (\mathbf{x}_k - \bar { \mathbf x}_k)\]

(see (Her25, eq. (17.16))).

Parameters:

model (DiscreteControlModel) – The model used to compute the dynamics.
x_bar (list) – A nominal list of states
u_bar (list) – A nominal list of actions (not used by the method)
L (list) – A list of control matrices \(L_k\)
l (list) – A list of control vectors \(\mathbf{l}_k\)
alpha – The linesearch parameter.

Returns:

A list of length \(N+1\) of simulated states and a list of length \(N\) of simulated actions.

Solutions to selected exercises#

Solution to problem 1

Part a: The dynamics is:

\[x_{k+1} = x_k + \Delta f(x_k, u_k) = x_k(1 + \Delta 4u_k) = f_k(x_k, u_k)\]

Part b: The derivative give us

\[\begin{split}A & = \frac{\partial f}{\partial x} = 1 + 4\Delta = 3\\ B & = \frac{\partial f}{\partial u} = 4\Delta \bar x = 2\bar x\end{split}\]

Finally we compute \(d\) to be:

\[d = f_k(\bar x, \bar u) - A \bar x - B \bar u = \bar x(1+4\Delta) - 3 \bar x - 2\bar x = -2 \bar x\]

So therefore:

\[x_{k+1} \approx 3x_k + 2\bar x u_k - 2\bar x\]

Note that we just applied the procedure outlined at the top of this page.

Problem 7.1

Exercise 7: Linearization and iterative LQR

Contents

Exercise 7: Linearization and iterative LQR#

Iterative LQR#

The cost and dynamic approximation#

Classes and functions#

Solutions to selected exercises#