Linear Quadratic Regulator in 3 Ways
Slide: Explore the Linear Quadratic Regulator (LQR) through the Indirect Shooting Method (Pontryagin's Principle), the Optimization Approach (Quadratic Programming), and the Recursive Solution (Riccati Equation). Inspired by Zachary Manchester's Spring 2024-25 lecture "Optimal Control and Reinforcement Learning".
Introduction
Problem Formulation
Consider a discrete-time linear system:
Quadratic cost function:
Assumptions:
- \((A_n,B_n)\) is controllable and \((A_n,C_n)\) is observable
- \(Q_n\succeq0,R_n\succeq0,Q_N\succeq0\)
Indirect Shooting: PMP Perspective
Problem Formulation and Optimality Conditions
Consider the deterministic discrete-time optimal control problem:
The first-order necessary conditions for optimality can be derived using:
- The Lagrangian framework (special case of KKT conditions)
- Pontryagin's Minimum Principle (PMP)
Lagrangian Formulation
Form the Lagrangian:
Define the Hamiltonian:
Rewrite the Lagrangian using the Hamiltonian:
Optimality Conditions
Take derivatives with respect to \(x\) and \(\lambda\):
For \(u\), we write the minimization explicitly to handle constraints:
Summary of Necessary Conditions
The first-order necessary conditions can be summarized as:
In continuous time, these become:
Application to LQR Problems
For LQR problems with quadratic cost and linear dynamics:
The necessary conditions simplify to:
This forms a linear two-point boundary value problem.
Indirect Shooting Algorithm for LQR
Procedure:
- Make initial guess for control sequence \(u_{1:N-1}\)
- Forward pass: Simulate dynamics to get state trajectory \(x_{1:N}\)
- Backward pass:
- Set terminal costate: \(\lambda_N = Q_N x_N\)
- Compute costate trajectory: \(\lambda_n = Q_n x_n + A_n^\top \lambda_{n+1}\)
- Compute control adjustment: \(\Delta u_n = -R_n^{-1} B_n^\top \lambda_{n+1} - u_n\)
- Line search: Update controls \(u_n \leftarrow u_n + \alpha \Delta u_n\)
- Iterate until convergence
Direct Approach: QP Perspective
LQR as Quadratic Programming Problem
Assume \(x_1\) is given, define the decision variable vector and the block-diagonal matrix:
The dynamics constraints can be expressed as
QP Formulation and KKT Conditions
The LQR problem becomes the QP:
The Lagrangian of this QP is:
The KKT conditions are:
This leads to the linear system:
We get the exact solution by solving one linear system!
Riccati Equation Solution
KKT System Structure for LQR
The KKT system for LQR has a highly structured sparse form, consider an \(N=4\) case:
Deriving the Riccati Recursion
Start from the terminal condition (blue equation):
Move to the previous equation (red equation):
Substitute \(x_4 = A_3 x_3 + B_3 u_3\):
Solve for \(u_3\):
Deriving the Riccati Recursion (Cont'd)
Now consider the green equation:
Substitute \(\lambda_4 = Q_4 x_4\) and \(x_4 = A_3 x_3 + B_3 u_3\):
Substitute \(u_3 = -K_3 x_3\):
Solve for \(\lambda_3\):
Riccati Recursion Formula
We now have a recursive relationship. Generalizing:
This is the celebrated Riccati equation.
The solution process involves:
- A backward Riccati pass to compute \(P_k\) and \(K_k\) for \(k = N-1, \ldots, 1\)
- A forward rollout to compute \(x_{1:N}\) and \(u_{1:N-1}\) using \(u_k = -K_k x_k\)
Computational Complexity
Naive QP Solution: Treats problem as one big least-squares.
- Computational cost: \(\mathbf{O[N^3(n+m)^3]}\)
- Must be re-solved from scratch for any change.
Riccati Recursion: Exploits the temporal structure.
- Computational cost: \(\mathbf{O[N(n+m)^3]}\)
- Exponentially faster for long horizons (large \(N\)).
The Riccati Solution is More Than Just Fast:
- It provides a ready-to-use feedback policy: \(\mathbf{u_k = -K_k x_k}\)
- This policy is adaptive: optimal for any initial state \(x_1\), not just a single one.
- It enables real-time control by naturally rejecting disturbances.
- And it delivers the exact same optimal solution as the QP.
Conclusion
Summary
Finite-Horizon Problems
- Use Riccati recursion backward in time
- Store gain matrices \(K_n\)
- Apply time-varying feedback
Infinite-Horizon Problems
- Solve algebraic Riccati equation offline
- Use constant gain matrix \(K_\infty\)
- Implement simple state feedback
- Algebraic Riccati Equation (ARE): $$ P_\infty=Q + A^\top P_\infty A - A^\top P_\infty B(R + B^\top P_\infty B){-1}B\top P_\infty A $$