bilibili
estimation
Kalman Filter in 3 Ways
Slide : Discover the Kalman Filter through the Geometric perspective of orthogonal projection, the Probabilistic perspective of Bayesian filtering, and the Optimization perspective of weighted least squares. Inspired by Ling Shi's 2024-25 Spring lecture "Networked Sensing, Estimation and Control".
Introduction
System Model and Assumptions
Consider a discrete-time linear Gaussian system with initial condition \(x_0\) and \(P_0\) :
\[
\begin{aligned}
x_{k+1} &= A_kx_k+B_ku_k+\omega_k,&\omega_k\sim\mathcal N(0,Q_k) \\
y_k &= C_kx_k + \nu_k, &\nu_k\sim\mathcal N(0,R_k)
\end{aligned}
\]
Assumptions:
\((A_k,B_k)\) is controllable and \((A_k,C_k)\) is observable
\(Q_k\succeq0,R_k\succeq0,P_0\succeq0\)
\(\omega_k\) , \(\nu_k\) and \(x_0\) are mutually uncorrelated
The future state of the system is conditionally independent of the past states given the current state
Goal: Find \(\hat x_{k|k}=\mathbb{E}[x_k|y_{1:k}]\) (MMSE estimator)
Geometric Perspective: Orthogonal Projection
Hilbert Space of Random Variables
Key Idea:
View random variables as vectors in Hilbert space
Inner product: \(\langle\xi,\eta\rangle=\mathbb{E}[\xi\eta]\)
Orthogonality: \(\xi\perp\eta\Leftrightarrow\mathbb{E}[\xi\eta]=0\)
Optimal estimate is orthogonal projection onto observation space
Geometric Interpretation:
Time Update
State Prediction:
\[
\begin{align*}
\hat{x}_{k|k-1} &= \mathbb{E}[x_k \mid y_{1:k-1}] \\
&= \mathbb{E}[A_{k-1} x_{k-1} + B_{k-1}u_{k-1} + w_{k-1} \mid y_{1:k-1}] \\
&= A_{k-1} \hat{x}_{k-1|k-1} + B_{k-1}u_{k-1} \qquad \text{(since $w_{k-1} \perp y_{1:k-1}$)}
\end{align*}
\]
Covariance Prediction:
\[
\begin{align*}
P_{k|k-1} &= \text{cov}(x_k - \hat{x}_{k|k-1}) \\
&= \text{cov}[A_{k-1}(x_{k-1} - \hat{x}_{k-1|k-1}) + w_{k-1}] \\
&= A_{k-1}\cdot\text{cov}(x_{k-1} - \hat{x}_{k-1|k-1})\cdot A_{k-1}^\top + 2A_{k-1}\cdot\text{cov}(x_k-\hat{x}_{k|k-1},\omega_{k-1}) + \text{cov}(w_{k-1}) \\
&= A_{k-1} P_{k-1|k-1} A_{k-1}^\top + Q_{k-1}
\end{align*}
\]
Innovation Process
Definition:
\[
\begin{align*}
e_k &= y_k - \hat{y}_{k\vert k-1} \\
&= y_k - \text{proj}_{\mathcal{Y}_{k-1}}(y_k) \\
&= y_k - \text{proj}_{\mathcal{Y}_{k-1}}(C_kx_k+\nu_k) \\
&= y_k - C_k\cdot\text{proj}_{\mathcal{Y}_{k-1}}(x_k)-\text{proj}_{\mathcal{Y}_{k-1}}(\nu_k) \\
&= y_k - C_k\hat{x}_{k\vert k-1}
\end{align*}
\]
Properties:
Zero Mean: \(\mathbb{E}[e_k]=0\)
White Sequence: \(\mathbb{E}[e_ke_j^\top]=0\) for \(k\ne j\)
Orthogonality Principle: \(\mathbb{E}[e_ky_j^\top]=0\) for \(j<k\)
Measurement Update
State Update:
\[
\begin{align*}
\hat{x}_{k\vert k} &= \text{proj}_{\mathcal{Y}_{k}}(x_k) \\
&= \hat{x}_{k\vert k-1}+K_ke_k \\
&= \hat{x}_{k\vert k-1}+K_k(y_k - C_k\hat{x}_{k\vert k-1})
\end{align*}
\]
Covariance Update:
\[
\begin{align*}
P_{k|k} &= \text{cov}(x_k - \hat{x}_{k|k}) \\
&= \text{cov}(x_k - \hat{x}_{k|k-1} - K_k e_k) \\
&= \text{cov}(x_k - \hat{x}_{k|k-1}) - 2 K_k \text{cov}(x_k - \hat{x}_{k|k-1}, e_k) + K_k \text{cov}(e_k) K_k^\top \\
&= \text{cov}(x_k - \hat{x}_{k|k-1}) - 2 K_k \text{cov}(x_k - \hat{x}_{k|k-1}, y_k - C_k\hat{x}_{k|k-1}) + K_k \text{cov}(y_k - C_k\hat{x}_{k|k-1}) K_k^\top \\
&= P_{k|k-1} - K_kC_kP_{k|k-1} -P_{k|k-1}C_k^\top K^\top+ K_k (C_k P_{k|k-1} C_k^\top + R_{k}) K_k^\top \\
\end{align*}
\]
Kalman Gain Derivation
Optimal Kalman Gain:
$\(\frac{\partial\text{tr}(P_{k\vert k})}{\partial K_k} = -2P_{k\vert k-1}C_k^\top + 2K_k(C_kP_{k\vert k-1}C_k^\top+R_{k}) = 0\) $
$\(K_k = P_{k\vert k-1}C_k^\top(C_kP_{k\vert k-1}C_k^\top+R_{k})^{-1}\) $
Covariance Derivation:
$\(P_{k|k} = P_{k\vert k-1}-K_kC_kP_{k\vert k-1} = (P_{k\vert k-1}^{-1}+C_k^\top R_{k}^{-1}C_k)^{-1}\) $
Probabilistic Perspective: Bayesian Filtering
Bayesian Filtering Framework
\[
\begin{aligned}
&p(x_{k}|y_{1:k}, u_{1:k})\\=\ &p(x_{k}|y_k,y_{1:k-1}, u_{1:k})\\=\ &\frac{p(y_{k}|x_k,y_{1:k-1}, u_{1:k})\cdot p(x_{k}|y_{1:k-1}, u_{1:k})}{p(y_{k}|y_{1:k-1}, u_{1:k})}\\
=\ &\eta\cdot p(y_k|x_k)\cdot p(x_{k}|y_{1:k-1}, u_{1:k}) \\
=\ &\eta\cdot p(y_k|x_k)\cdot\int p(x_{k},x_{k-1}|y_{1:k-1}, u_{1:k})\ {\rm d}x_{k-1} \\
=\ &\eta\cdot p(y_k|x_k)\cdot\int p(x_{k}|x_{k-1},y_{1:k-1},u_{1:k})\cdot p(x_{k-1}|y_{1:k-1},u_{1:k})\ {\rm d}x_{k-1} \\
=\ &\eta\cdot\underbrace{p(y_k|x_k)}_\text{observation model}\cdot\int\underbrace{p(x_{k}|x_{k-1},u_{k})}_\text{motion model}\cdot\underbrace{p(x_{k-1}|y_{1:k-1},u_{1:k-1})}_\text{previous belief}\ {\rm d}x_{k-1}
\end{aligned}
\]
Prediction Step: Gaussian Propagation
\[
p(x_{k}|y_{1:k}, u_{1:k}) = \eta\cdot\mathcal N(y_k;C_kx_k,R_k)\cdot\int\mathcal{N}(x_k;A_{k-1}x_{k-1}+B_{k-1}u_{k-1},Q_{k-1})\cdot\mathcal{N}(x_{k-1};\hat{x}_{k-1},P_{k-1})\ {\rm d}x_{k-1}
\]
Predicted Mean:
\[
\begin{aligned}
\hat{x}_{k|k-1} &= \mathbb{E}[A_{k-1}x_{k-1} + B_{k-1}u_{k-1} + w_{k-1}] \\
&= A_{k-1}\mathbb{E}[x_{k-1}] + B_{k-1}u_{k-1} + \mathbb{E}[w_{k-1}] \\
&= A_{k-1}\hat{x}_{k-1} + B_{k-1}u_{k-1}
\end{aligned}
\]
Predicted Covariance:
\[
\begin{aligned}
P_{k|k-1} &= \text{cov}[A_{k-1}x_{k-1} + B_{k-1}u_{k-1} + w_{k-1}] \\
&= \text{cov}[A_{k-1}x_{k-1}] + \text{cov}[w_{k-1}] \\
&= A_{k-1}\text{cov}[x_{k-1}]A_{k-1}^\top + Q_{k-1} \\
&= A_{k-1}P_{k-1}A_{k-1}^\top + Q_{k-1}
\end{aligned}
\]
Update Step: Gaussian Product
\[
p(x_{k}|y_{1:k}, u_{1:k}) = \eta\cdot\mathcal N(y_k;C_kx_k,R_k)\cdot\mathcal{N}(x_k;\hat x_{k|k-1},P_{k|k-1})
\]
Gaussian Product:
\[
\mathcal{N}(x;\mu,\Sigma)\propto\mathcal{N}(x;\mu_1,\Sigma_1)\cdot\mathcal{N}(x,\mu_2,\Sigma_2)
\]
\[
\begin{align*}
\Sigma^{-1} &= \Sigma_1^{-1} + \Sigma_2^{-1} \\
\mu &= \Sigma(\Sigma_1^{-1}\mu_1+\Sigma_2^{-1}\mu_2)
\end{align*}
\]
Posterior Result:
\[
\begin{align*}
\hat{x}_{k|k} &= \hat{x}_{k|k-1} + K_k (y_k - C_k \hat{x}_{k|k-1}) \\
K_k &= P_{k|k-1} C_k^\top (C_k P_{k|k-1} C_k^\top + R)^{-1} \\
P_{k|k} &= (I - K_k C_k) P_{k|k-1}
\end{align*}
\]
Optimization Perspective: MAP Estimation
Maximum A Posteriori Formulation
MAP Estimation:
\[
\begin{align*}
\hat{x}_{k|k} &= \arg\max_{x_k} p(x_k \mid y_{1:k}) \\
&= \arg\min_{x_k} \left[ -\log p(x_k \mid y_{1:k}) \right]
\end{align*}
\]
Weighted Least Square:
\[
\mathcal E(x) = ||Mx-n||^2_\Sigma=x^\top M^\top\Sigma^{-1}Mx-2n^\top\Sigma^{-1}Mx+n^\top\Sigma^{-1}n
\]
\[
\nabla\mathcal E = 2M^\top\Sigma^{-1}Mx - 2M^\top\Sigma^{-1}n
\]
\[
\hat x = (M^\top\Sigma^{-1}M)^{-1}M^\top\Sigma^{-1}n
\]
MAP as Weighted Least Squares
Posterior Distribution:
\[
p(x_k \mid y_{1:k}) \propto p(y_k \mid x_k) p(x_k \mid y_{1:k-1})
\]
Assume Gaussian Distributions:
\[
\begin{align*}
p(x_k \mid y_{1:k-1}) &= \mathcal{N}(x_k; \hat{x}_{k|k-1}, P_{k|k-1}) \\
p(y_k \mid x_k) &= \mathcal{N}(y_k; C_k x_k, R_k)
\end{align*}
\]
Negative Log-Posterior:
\[
\begin{align*}
-\log p(x_k \mid y_{1:k}) &\propto \frac{1}{2} \|y_k - C_k x_k\|_{R_k^{-1}}^2 + \frac{1}{2} \|x_k - \hat{x}_{k|k-1}\|_{P_{k|k-1}^{-1}}^2 \\
&= \frac{1}{2} \left\| \begin{bmatrix} C_k \\ I \end{bmatrix} x_k - \begin{bmatrix} y_k \\ \hat{x}_{k|k-1} \end{bmatrix} \right\|_{\Sigma^{-1}}^2
\end{align*}
\]
where \(\Sigma = \begin{bmatrix} R_k & 0 \\ 0 & P_{k|k-1} \end{bmatrix}\) .
MAP Solution
Weighted Least Squares Form:
\[
M = \begin{bmatrix} C_k \\ I \end{bmatrix}, \quad n = \begin{bmatrix} y_k \\ \hat{x}_{k|k-1} \end{bmatrix}, \quad \Sigma = \begin{bmatrix} R_k & 0 \\ 0 & P_{k|k-1} \end{bmatrix}
\]
MAP Estimate:
\[
\begin{align*}
\hat{x}_{k|k} &= \left( M^\top \Sigma^{-1} M \right)^{-1} M^\top \Sigma^{-1} n \\
&= \left( C_k^\top R_k^{-1} C_k + P_{k|k-1}^{-1} \right)^{-1} \left( C_k^\top R_k^{-1} y_k + P_{k|k-1}^{-1} \hat{x}_{k|k-1} \right)
\end{align*}
\]
Equivalence Proof
Using Matrix Inversion Lemma:
\[
\begin{align*}
\hat{x}_{k|k} &= \left( C_k^\top R_k^{-1} C_k + P_{k|k-1}^{-1} \right)^{-1} \left( C_k^\top R_k^{-1} y_k + P_{k|k-1}^{-1} \hat{x}_{k|k-1} \right) \\
&= \hat{x}_{k|k-1} + P_{k|k-1} C_k^\top (C_k P_{k|k-1} C_k^\top + R_k)^{-1} (y_k - C_k \hat{x}_{k|k-1})
\end{align*}
\]
Proof:
\[
\begin{align*}
\left( C_k^\top R_k^{-1} C_k + P_{k|k-1}^{-1} \right)^{-1} C_k^\top R_k^{-1} = P_{k|k-1} C_k^\top (C_k P_{k|k-1} C_k^\top + R_k)^{-1}
\end{align*}
\]
This shows the equivalence between the MAP solution and the Kalman update.
Conclusion
Theoretical Insights and Extensions
Key Insights:
Geometric : Reveals orthogonality principle and innovation process
Probabilistic : Shows optimality under Gaussian assumptions
Optimization : Connects to weighted least squares and regularization
Unified Algorithm: All approaches yield the same recursive equations:
\[
\begin{align*}
\text{time update } & \begin{cases}
\hat{x}_{k|k-1} = A_{k-1} \hat{x}_{k-1|k-1} \\
P_{k|k-1} = A_{k-1} P_{k-1|k-1} A_{k-1}^\top + Q
\end{cases} \\
\text{measurement update } & \begin{cases}
K_k = P_{k|k-1} C_k^\top (C_k P_{k|k-1} C_k^\top + R)^{-1} \\
\hat{x}_{k|k} = \hat{x}_{k|k-1} + K_k (y_k - C_k\hat{x}_{k|k-1}) \\
P_{k|k} = (I - K_k C_k) P_{k|k-1}
\end{cases}
\end{align*}
\]
Extensions:
Nonlinear systems: EKF, UKF, particle filters
Non-Gaussian noise: robust Kalman filters