Exercise 6: Linear-quadratic problems in control

Exercise 6: Linear-quadratic problems in control#

Note

  • This page contains background information which may be useful in future exercises or projects. You can download this weeks exercise instructions from here:

  • Slides: [1x] ([6x]). Reading: Chapter 16, [Her25].

  • You are encouraged to prepare the homework problems 1 (indicated by a hand in the PDF file) at home and present your solution during the exercise session.

  • To get the newest version of the course material, please see Making sure your files are up to date

This weeks exercise will be about the LQR (Linear-quadratic regulator) algorithm. The starting point of the LQR algorithm is a problem of the form (\(k=0,\dots,N-1\)):

\[\begin{split}\mathbf x_{k+1} & = A_k x_k + B_k u_k + \mathbf d_k. \\ \text{cost} & = \frac{1}{2} \mathbf x_N^\top Q_N \mathbf x_N + \frac{1}{2} \mathbf q_N^\top x_N + q_N + \sum_{k=0}^{N-1} c(\mathbf x_k, \mathbf u_k). \\ c(\mathbf x_k, \mathbf u_k) & = \frac{1}{2} \mathbf x_k^T Q_k \mathbf x_k + \frac{1}{2} \mathbf u_k^T R_k \mathbf u_k + \mathbf u_k^T H_k \mathbf x_k +\mathbf{q}_k^T \mathbf x_k + \mathbf{r}^T_k \mathbf u_k + q_k.\end{split}\]

This means there are \(N\) matrices \(A_k\) and so on. To practically implement this, we will assume the matrices are stored as lists: \(A_0, A_1, \dots A_{N-1}\) = A = [A0, A1, ...] of length \(N\) so that A[k] corresponds to \(A_k\). The same goes for all the other terms.

Hint

Only the terms A and B are required. The rest of the terms will default to 0-matrices.

The LQR algorithm takes a problem of this form and then ultimately compute a control law:

\[\mathbf u_k = L_k \mathbf x_k + \mathbf l_k\]

And a cost-to-go function as:

\[J_k(\mathbf x_k) = \frac{1}{2} \mathbf x_k^\top V_k \mathbf x_k + \mathbf v_k^\top \mathbf x_k + v_k\]

Again there are \(N-1\) terms. The function we implement will therefore end with: return (L, l), (V, v, vc) so that L[k] corresponds to \(L_k\) and vc[k] corresponds to \(v_k\).

Example: Calling the LQR function#

Consider a discrete system with dynamics:

\[\begin{split}\mathbf x_{k+1} = \begin{bmatrix}1 & 1 \\ 0 & 1 \end{bmatrix} \mathbf x_k + \begin{bmatrix} 0 \\ 1 \end{bmatrix} \mathbf u_k\end{split}\]

and cost:

\[\begin{split}\frac{1}{2} \mathbf x_N^\top \begin{bmatrix}1 & 0 \\ 0 & 0 \end{bmatrix} \mathbf x_N + \sum_{k=0}^{N-1} \frac{1}{2} \mathbf x_k^\top \begin{bmatrix}1 & 0 \\ 0 & 0 \end{bmatrix} \mathbf x_k + \frac{1}{2} u_k^2\end{split}\]

The following code will run LQR on this problem using a horizon of \(N=20\) and then compute the first action:

\[u_0 = L_0 \mathbf x_0 + \mathbf l_0\]

Using \(\mathbf x_0 = \begin{bmatrix} 1 \\ 0\end{bmatrix}\):

/builds/02465material/02465public/py312/lib/python3.12/site-packages/pygame/pkgdata.py:25: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import resource_stream, resource_exists

Note that the code uses the syntax [A]*N to create a list containing A repeated \(N\) times. An example:

>>> [42]*3
[42, 42, 42]

Classes and functions#

irlc.ex06.dlqr.LQR(A, B, d=None, Q=None, R=None, H=None, q=None, r=None, qc=None, QN=None, qN=None, qcN=None, mu=0)[source]#

Implement the LQR as defined in (Her25, Algorithm 22). I recommend viewing this documentation online (documentation for week 6).

When you solve this exercise, look at the algorithm in the book. Since the LQR problem is on the form:

\[x_{k+1} = A_k x_k + B_k u_k + d_k\]

For \(k=0,\dots,N-1\) this means there are \(N\) matrices \(A_k\). This is implemented by assuming that A (i.e., the input argument) is a list of length \(N\) so that A[k] corresponds to \(A_k\).

Similar conventions are used for the cost term (please see the lecture notes or the online documentation for their meaning). Recall it has the form:

\[c(x_k, u_k) = \frac{1}{2} \mathbf x_k^\top Q_k \mathbf x_k + \frac{1}{2} \mathbf q_k^\top \mathbf x_k + q_k + \cdots\]

When the function is called, the vector \(\textbf{q}_k\) corresponds to q and the constant \(q_k\) correspond to qc (q-constant).

Note

Only the terms A and B are required. The rest of the terms will default to 0-matrices.

The LQR algorithm will ultimately compute a control law of the form:

\[\mathbf u_k = L_k \mathbf x_k + \mathbf l_k\]

And a cost-to-go function as:

\[J_k(x_k) = \frac{1}{2} \mathbf x_k^\top V_k \mathbf x_k + v_k^\top \mathbf x_k + v_k\]

Again there are \(N-1\) terms. The function then return return (L, l), (V, v, vc) so that L[k] corresponds to \(L_k\).

Parameters:
  • A (list) – A list of np.ndarray containing all terms \(A_k\)

  • B (list) – A list of np.ndarray containing all terms \(B_k\)

  • d (list) – A list of np.ndarray containing all terms \(\mathbf d_k\) (optional)

  • Q (list) – A list of np.ndarray containing all terms \(Q_k\) (optional)

  • R (list) – A list of np.ndarray containing all terms \(R_k\) (optional)

  • H (list) – A list of np.ndarray containing all terms \(H_k\) (optional)

  • q (list) – A list of np.ndarray containing all terms \(\mathbf q_k\) (optional)

  • r (list) – A list of np.ndarray containing all terms \(\mathbf r_k\) (optional)

  • qc (list) – A list of float containing all terms \(q_k\) (i.e., constant terms) (optional)

  • QN (ndarray) – A np.ndarray containing the terminal cost term \(Q_N\) (optional)

  • qN (ndarray) – A np.ndarray containing the terminal cost term \(\mathbf q_N\) (optional)

  • qcN (ndarray) – A np.ndarray containing the terminal cost term \(q_N\)

  • mu (float) – A regularization term which is useful for iterative-LQR (next week). Default to 0.

Returns:

A tuple of the form (L, l), (V, v, vc) corresponding to the control and cost-matrices.

Solutions to selected exercises#