Exercise 6: Linear-quadratic problems in control

Exercise 6: Linear-quadratic problems in control#

Note

This page contains background information which may be useful in future exercises or projects. You can download this weeks exercise instructions from here:
- 02465ex6_Python.pdf
Slides: [1x] ([6x]). Reading: Chapter 16, [Her25].
You are encouraged to prepare the homework problems 1 (indicated by a hand in the PDF file) at home and present your solution during the exercise session.
To get the newest version of the course material, please see Making sure your files are up to date

This weeks exercise will be about the LQR (Linear-quadratic regulator) algorithm. The starting point of the LQR algorithm is a problem of the form (\(k=0,\dots,N-1\)):

\[\begin{split}\mathbf x_{k+1} & = A_k x_k + B_k u_k + \mathbf d_k. \\ \text{cost} & = \frac{1}{2} \mathbf x_N^\top Q_N \mathbf x_N + \frac{1}{2} \mathbf q_N^\top x_N + q_N + \sum_{k=0}^{N-1} c(\mathbf x_k, \mathbf u_k). \\ c(\mathbf x_k, \mathbf u_k) & = \frac{1}{2} \mathbf x_k^T Q_k \mathbf x_k + \frac{1}{2} \mathbf u_k^T R_k \mathbf u_k + \mathbf u_k^T H_k \mathbf x_k +\mathbf{q}_k^T \mathbf x_k + \mathbf{r}^T_k \mathbf u_k + q_k.\end{split}\]

This means there are \(N\) matrices \(A_k\) and so on. To practically implement this, we will assume the matrices are stored as lists: \(A_0, A_1, \dots A_{N-1}\) = A = [A0, A1, ...] of length \(N\) so that A[k] corresponds to \(A_k\). The same goes for all the other terms.

Hint

Only the terms A and B are required. The rest of the terms will default to 0-matrices.

The LQR algorithm takes a problem of this form and then ultimately compute a control law:

\[\mathbf u_k = L_k \mathbf x_k + \mathbf l_k\]

And a cost-to-go function as:

\[J_k(\mathbf x_k) = \frac{1}{2} \mathbf x_k^\top V_k \mathbf x_k + \mathbf v_k^\top \mathbf x_k + v_k\]

Again there are \(N-1\) terms. The function we implement will therefore end with: return (L, l), (V, v, vc) so that L[k] corresponds to \(L_k\) and vc[k] corresponds to \(v_k\).

Example: Calling the LQR function#

Consider a discrete system with dynamics:

\[\begin{split}\mathbf x_{k+1} = \begin{bmatrix}1 & 1 \\ 0 & 1 \end{bmatrix} \mathbf x_k + \begin{bmatrix} 0 \\ 1 \end{bmatrix} \mathbf u_k\end{split}\]

and cost:

\[\begin{split}\frac{1}{2} \mathbf x_N^\top \begin{bmatrix}1 & 0 \\ 0 & 0 \end{bmatrix} \mathbf x_N + \sum_{k=0}^{N-1} \frac{1}{2} \mathbf x_k^\top \begin{bmatrix}1 & 0 \\ 0 & 0 \end{bmatrix} \mathbf x_k + \frac{1}{2} u_k^2\end{split}\]

The following code will run LQR on this problem using a horizon of \(N=20\) and then compute the first action:

\[u_0 = L_0 \mathbf x_0 + \mathbf l_0\]

Using \(\mathbf x_0 = \begin{bmatrix} 1 \\ 0\end{bmatrix}\):

/builds/02465material/02465public/py312/lib/python3.12/site-packages/pygame/pkgdata.py:25: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import resource_stream, resource_exists

Note that the code uses the syntax [A]*N to create a list containing A repeated \(N\) times. An example:

>>> [42]*3
[42, 42, 42]

Classes and functions#

irlc.ex06.dlqr.LQR(A, B, d=None, Q=None, R=None, H=None, q=None, r=None, qc=None, QN=None, qN=None, qcN=None, mu=0)[source]#

Implement the LQR as defined in (Her25, Algorithm 22). I recommend viewing this documentation online (documentation for week 6).

When you solve this exercise, look at the algorithm in the book. Since the LQR problem is on the form:

\[x_{k+1} = A_k x_k + B_k u_k + d_k\]

For \(k=0,\dots,N-1\) this means there are \(N\) matrices \(A_k\). This is implemented by assuming that A (i.e., the input argument) is a list of length \(N\) so that A[k] corresponds to \(A_k\).

Similar conventions are used for the cost term (please see the lecture notes or the online documentation for their meaning). Recall it has the form:

\[c(x_k, u_k) = \frac{1}{2} \mathbf x_k^\top Q_k \mathbf x_k + \frac{1}{2} \mathbf q_k^\top \mathbf x_k + q_k + \cdots\]

When the function is called, the vector \(\textbf{q}_k\) corresponds to q and the constant \(q_k\) correspond to qc (q-constant).

Note

Only the terms A and B are required. The rest of the terms will default to 0-matrices.

The LQR algorithm will ultimately compute a control law of the form:

\[\mathbf u_k = L_k \mathbf x_k + \mathbf l_k\]

And a cost-to-go function as:

\[J_k(x_k) = \frac{1}{2} \mathbf x_k^\top V_k \mathbf x_k + v_k^\top \mathbf x_k + v_k\]

Again there are \(N-1\) terms. The function then return return (L, l), (V, v, vc) so that L[k] corresponds to \(L_k\).

Parameters:

A (list) – A list of np.ndarray containing all terms \(A_k\)
B (list) – A list of np.ndarray containing all terms \(B_k\)
d (list) – A list of np.ndarray containing all terms \(\mathbf d_k\) (optional)
Q (list) – A list of np.ndarray containing all terms \(Q_k\) (optional)
R (list) – A list of np.ndarray containing all terms \(R_k\) (optional)
H (list) – A list of np.ndarray containing all terms \(H_k\) (optional)
q (list) – A list of np.ndarray containing all terms \(\mathbf q_k\) (optional)
r (list) – A list of np.ndarray containing all terms \(\mathbf r_k\) (optional)
qc (list) – A list of float containing all terms \(q_k\) (i.e., constant terms) (optional)
QN (ndarray) – A np.ndarray containing the terminal cost term \(Q_N\) (optional)
qN (ndarray) – A np.ndarray containing the terminal cost term \(\mathbf q_N\) (optional)
qcN (ndarray) – A np.ndarray containing the terminal cost term \(q_N\)
mu (float) – A regularization term which is useful for iterative-LQR (next week). Default to 0.

Returns:

A tuple of the form (L, l), (V, v, vc) corresponding to the control and cost-matrices.

Solutions to selected exercises#

Solution to problem 1

Part a: Let’s do the calculation for all values of \(x_{N-1}\) and \(u_{N-1}\). Simply plug in the expression of \(g_N\) and \(f_k\) and expand the square:

\[\begin{split}\mathbb{E}[g_N(x_N) | x_{N-1} = x, u_{N-1} = u] & = \mathbb{E}_w[ (a x + u - \lambda w )^2 ] \\ & = \mathbb{E}_w[ (ax + u)^2 - 2(ax+u)\lambda w + \lambda^2 w^2 ]\end{split}\]

Then use that \(w\) has mean 0 and variance 1 to get:

\[\mathbb{E}[g_N(x_N) | x_{N-1} = x, u_{N-1} = u] = (ax + u)^2 + \lambda^2\]

Pluggin in \(\lambda=3, u=1, x=0\) gives the result \(10\).

Part b: According to DP, the optimal policy can be found as \(\mu_{N-1}(x_{N-1}) = \arg min_u Q(x_{N-1}, u)\) where

\[Q(x_{N-1}, u) = \mathbb{EE}[ g_{N-1}(x_{N-1}) + g_N( x_N) | x_{N-1}, u_{N-1}=u]\]

The term involving \(g_{N-1}\) does not depend on the random noise \(w\) and so:

\[Q(x_{N-1}, u) = u^2 + \mathbb{EE}[g_N( x_N) | x_{N-1}, u_{N-1}=u] = u^2 + (a x_{N-1} + u)^2 + \lambda^2\]

Minimizing the right-hand side with respect to \(u\) can be done by setting the derivative equal to zero:

\[0 = 2u^* + 2(a x_{N-1} + u^*)\]

And therefore we get \(\mu_{N-1}(x_{N-1}) = \frac{-a x_{N-1}}{2}\). In other words, the optimal policy is independent of \(\lambda\).

Comments: This derivation is a 1-d equivalent of one step of the LQR derivation. The result we found above is similar to the result in the lecture note of how the LQR algorithm is not affected by noise.

By using the constraint equations we could potentially recover \(u_1\) and \(x_1\), but that is left as an exercise.

Problem 6.1-6.3

Problem 6.4