Exercise 6: Linear-quadratic problems in control

Exercise 6: Linear-quadratic problems in control#

Note

  • The exercises material is divided into general information (found on this page) and the actual exercise instructions. You can download this weeks exercise instructions from here:

  • You are encouraged to prepare the homework problems 1 (indicated by a hand in the PDF file) at home and present your solution during the exercise session.

  • To get the newest version of the course material, please see Making sure your files are up to date

This weeks exercise will be about the LQR (Linear-quadratic regulator) algorithm. The starting point of the LQR algorithm is a problem of the form (\(k=0,\dots,N-1\)):

\[\begin{split}\mathbf x_{k+1} & = A_k x_k + B_k u_k + \mathbf d_k. \\ \text{cost} & = \frac{1}{2} \mathbf x_N^\top Q_N \mathbf x_N + \frac{1}{2} \mathbf q_N^\top x_N + q_N + \sum_{k=0}^{N-1} c(\mathbf x_k, \mathbf u_k). \\ c(\mathbf x_k, \mathbf u_k) & = \frac{1}{2} \mathbf x_k^T Q_k \mathbf x_k + \frac{1}{2} \mathbf u_k^T R_k \mathbf u_k + \mathbf u_k^T H_k \mathbf x_k +\mathbf{q}_k^T \mathbf x_k + \mathbf{r}^T_k \mathbf u_k + q_k.\end{split}\]

This means there are \(N\) matrices \(A_k\) and so on. To practically implement this, we will assume the matrices are stored as lists: \(A_0, A_1, \dots A_{N-1}\) = A = [A0, A1, ...] of length \(N\) so that A[k] corresponds to \(A_k\). The same goes for all the other terms.

Hint

Only the terms A and B are required. The rest of the terms will default to 0-matrices.

The LQR algorithm takes a problem of this form and then ultimately compute a control law:

\[\mathbf u_k = L_k \mathbf x_k + \mathbf l_k\]

And a cost-to-go function as:

\[J_k(\mathbf x_k) = \frac{1}{2} \mathbf x_k^\top V_k \mathbf x_k + \mathbf v_k^\top \mathbf x_k + v_k\]

Again there are \(N-1\) terms. The function we implement will therefore end with: return (L, l), (V, v, vc) so that L[k] corresponds to \(L_k\) and vc[k] corresponds to \(v_k\).

Example: Calling the LQR function#

Consider a discrete system with dynamics:

\[\begin{split}\mathbf x_{k+1} = \begin{bmatrix}1 & 1 \\ 0 & 1 \end{bmatrix} \mathbf x_k + \begin{bmatrix} 0 \\ 1 \end{bmatrix} \mathbf u_k\end{split}\]

and cost:

\[\begin{split}\frac{1}{2} \mathbf x_N^\top \begin{bmatrix}1 & 0 \\ 0 & 0 \end{bmatrix} \mathbf x_N + \sum_{k=0}^{N-1} \frac{1}{2} \mathbf x_k^\top \begin{bmatrix}1 & 0 \\ 0 & 0 \end{bmatrix} \mathbf x_k + \frac{1}{2} u_k^2\end{split}\]

The following code will run LQR on this problem using a horizon of \(N=20\) and then compute the first action:

\[u_0 = L_0 \mathbf x_0 + \mathbf l_0\]

Using \(\mathbf x_0 = \begin{bmatrix} 1 \\ 0\end{bmatrix}\):

>>> import numpy as np
>>> from irlc.ex06.dlqr import LQR
>>> N = 20 # Planning horizon.
>>> A = np.asarray([[1, 1], [0, 1]])
>>> B = np.asarray([[0], [1]])
>>> Q = np.asarray([[1, 0], [0, 0]])
>>> R = np.asarray([[1]])
>>> (L,l), (V,v,vc) = LQR(A=[A]*N, B=[B]*N, Q=[Q]*N, R=[R]*N, QN=Q)
>>> x = np.asarray([1, 0])
>>> u = L[0] @ x + l[0]
>>> print(f"Action taken at time k=0 in state {x=} is {u=}")
Action taken at time k=0 in state x=array([1, 0]) is u=array([-0.48053382])

Note that the code uses the syntax [A]*N to create a list containing A repeated \(N\) times. An example:

>>> [42]*3
[42, 42, 42]

Classes and functions#

irlc.ex06.dlqr.LQR(A, B, d=None, Q=None, R=None, H=None, q=None, r=None, qc=None, QN=None, qN=None, qcN=None, mu=0)[source]#

Implement the LQR as defined in (Her24, Algorithm 22). I recommend viewing this documentation online (documentation for week 6).

When you solve this exercise, look at the algorithm in the book. Since the LQR problem is on the form:

\[x_{k+1} = A_k x_k + B_k u_k + d_k\]

For \(k=0,\dots,N-1\) this means there are \(N\) matrices \(A_k\). This is implemented by assuming that A (i.e., the input argument) is a list of length \(N\) so that A[k] corresponds to \(A_k\).

Similar conventions are used for the cost term (please see the lecture notes or the online documentation for their meaning). Recall it has the form:

\[c(x_k, u_k) = \frac{1}{2} \mathbf x_k^\top Q_k \mathbf x_k + \frac{1}{2} \mathbf q_k^\top \mathbf x_k + q_k + \cdots\]

When the function is called, the vector \(\textbf{q}_k\) corresponds to q and the constant \(q_k\) correspond to qc (q-constant).

Note

Only the terms A and B are required. The rest of the terms will default to 0-matrices.

The LQR algorithm will ultimately compute a control law of the form:

\[\mathbf u_k = L_k \mathbf x_k + \mathbf l_k\]

And a cost-to-go function as:

\[J_k(x_k) = \frac{1}{2} \mathbf x_k^\top V_k \mathbf x_k + v_k^\top \mathbf x_k + v_k\]

Again there are \(N-1\) terms. The function then return return (L, l), (V, v, vc) so that L[k] corresponds to \(L_k\).

Parameters:
  • A (list) – A list of np.ndarray containing all terms \(A_k\)

  • B (list) – A list of np.ndarray containing all terms \(B_k\)

  • d (list) – A list of np.ndarray containing all terms \(\mathbf d_k\) (optional)

  • Q (list) – A list of np.ndarray containing all terms \(Q_k\) (optional)

  • R (list) – A list of np.ndarray containing all terms \(R_k\) (optional)

  • H (list) – A list of np.ndarray containing all terms \(H_k\) (optional)

  • q (list) – A list of np.ndarray containing all terms \(\mathbf q_k\) (optional)

  • r (list) – A list of np.ndarray containing all terms \(\mathbf r_k\) (optional)

  • qc (list) – A list of float containing all terms \(q_k\) (i.e., constant terms) (optional)

  • QN (ndarray) – A np.ndarray containing the terminal cost term \(Q_N\) (optional)

  • qN (ndarray) – A np.ndarray containing the terminal cost term \(\mathbf q_N\) (optional)

  • qcN (ndarray) – A np.ndarray containing the terminal cost term \(q_N\)

  • mu (float) – A regularization term which is useful for iterative-LQR (next week). Default to 0.

Returns:

A tuple of the form (L, l), (V, v, vc) corresponding to the control and cost-matrices.

Solutions to selected exercises#