Kalman Filter in 3 Ways

This is the lecture notes for "ELEC 5650: Networked Sensing, Estimation and Control" in the 2024-25 Spring semester, delivered by Prof. Ling Shi at HKUST. In this session, we will deviate Kalman Filter from three different perspectives: Geometric, Probabilistic, and Optimization approaches.

Takeaway Notes

Consider an LTI system with initial conditions \(\hat x_{0\vert0}\) and \(\hat P_{0\vert0}\)

\[ \begin{aligned} x_{k+1} &= Ax_k+Bu_k+\omega_k,&\omega_k\sim\mathcal N(0,Q) \\ y_k &= Cx_k + \nu_k, &\nu_k\sim\mathcal N(0,R) \end{aligned} \]

Find the estimation of \(x_k\) given \(\set{u_0,u_1\cdots,u_k}\) and \(\set{y_0,y_1\cdots,y_k}\).

Assumptions

\((A,B)\) is controllable and \((A,C)\) is observable
\(Q\succeq0,R\succeq0,P_0\succeq0\)
\(\omega_k\), \(\nu_k\) and \(\hat x_0\) are mutually uncorelated
The future state of the system is conditionally independent of the past states given the current state

Time Update

\[ \begin{aligned} \hat x_{k\vert k-1}&=A\hat x_{k-1\vert k-1}+Bu_k \\ \hat P_{k\vert k-1} &= A\hat P_{k-1\vert k-1}A^T+Q \end{aligned} \]

Measurement Update

\[ \begin{aligned} K_k &= \hat P_{k\vert k-1}C^T(C\hat P_{k\vert k-1}C^T+R)^{-1} \\ \hat x_{k\vert k} &= \hat x_{k\vert k-1} + K_k(y_k-C\hat x_{k\vert k-1}) \\ \hat P_{k\vert k} &= \hat P_{k\vert k-1}-K_kC\hat P_{k\vert k-1} = (\hat P_{k\vert k-1}^{-1}+C^TR^{-1}C)^{-1}\\ \end{aligned} \]

Geometric Perspective (LMMSE Estimation)

The Geometric perspective views Kalman Filter as a Linear Minimum Mean Square Error (LMMSE) estimator, which is rooted in orthogonal projection theory in Hilbert space. The key insight is that the Kalman Filter's innovation term \(e_k\) is orthogonal to all past observations \(\mathcal{Y}_{k-1}\), ensuring the new information being incorporated is statistically independent of previous measurements, thus maintaining the estimator's optimality.

Time Update

\[ \begin{aligned} \hat{x}_{k\vert k-1} &= \text{proj}_{\mathcal{Y}_{k-1}}(x_k) \\ &= \text{proj}_{\mathcal{Y}_{k-1}}(Ax_{k-1}+Bu_k+\omega_k) \\ &= A\cdot\text{proj}_{\mathcal{Y}_{k-1}}(x_{k-1}) + B\cdot\text{proj}_{\mathcal{Y}_{k-1}}(u_k)+\text{proj}_{\mathcal{Y}_{k-1}}(\omega_k) \\ &= A\hat x_{k-1\vert k-1} + Bu_k \\ \\ \tilde{x}_{k\vert k-1}&= x_k-\hat{x}_{k\vert k-1} \\ \\ \hat{P}_{k\vert k-1} &= \mathbb E[\tilde{x}_{k\vert k-1}\tilde{x}_{k\vert k-1}^T] \\ &= \mathbb{E}[(x_k-\hat{x}_{k\vert k-1})(x_k-\hat{x}_{k\vert k-1})^T] \\ &= \mathbb{E}[(A\tilde{x}_{k-1\vert k-1}+\omega_k)(A\tilde{x}_{k-1\vert k-1}+\omega_k)^T] \\ &= A\mathbb{E}[\tilde{x}_{k-1\vert k-1}\tilde{x}_{k-1\vert k-1}^T]A^T+2A\mathbb{E}[\tilde{x}_{k-1\vert k-1}\omega_k^T]+\mathbb{E}[\omega_k\omega_k^T] \\ &= A\hat{P}_{k-1\vert k-1}A^T+Q \end{aligned} \]

Measurement Update

\[ \begin{aligned} e_k &= y_k - \hat{y}_{k\vert k-1} \\ &= y_k - \text{proj}_{\mathcal{Y}_{k-1}}(y_k) \\ &= y_k - \text{proj}_{\mathcal{Y}_{k-1}}(Cx_k+\nu_k) \\ &= y_k - C\cdot\text{proj}_{\mathcal{Y}_{k-1}}(x_k)-\text{proj}_{\mathcal{Y}_{k-1}}(\nu_k) \\ &= y_k - C\hat{x}_{k\vert k-1} \\ \\ \hat{x}_{k\vert k} &= \text{proj}_{\mathcal{Y}_{k}}(x_k) \\ &= \hat{x}_{k\vert k-1}+K_ke_k \\ &= \hat{x}_{k\vert k-1}+K_k(y_k - C\hat{x}_{k\vert k-1}) \\ \\ \tilde{x}_{k\vert k}&= x_k-\hat{x}_{k\vert k} \\ \\ \hat{P}_{k\vert k} &= \mathbb{E}[\tilde{x}_{k\vert k}\tilde{x}_{k\vert k}^T] \\ &= \mathbb{E}[(x_k-\hat{x}_{k\vert k})(x_k-\hat{x}_{k\vert k})^T] \\ &= \mathbb{E}[(\tilde{x}_{k\vert k-1}-K_ke_k)(\tilde{x}_{k\vert k-1}-K_ke_k)^T] \\ &= \mathbb{E}[\tilde{x}_{k\vert k-1}\tilde{x}_{k\vert k-1}^T] - 2K_k\mathbb{E}[e_k\tilde{x}_{k\vert k-1}^T]+K_k\mathbb{E}[e_ke_k^T]K_k^T \\ &= \hat{P}_{k\vert k-1} - 2K_k\mathbb{E}[(y_k-C\hat{x}_{k\vert k-1})\tilde{x}_{k\vert k-1}^T]+K_k\mathbb{E}[(y_k-C\hat{x}_{k\vert k-1})(y_k-C\hat{x}_{k\vert k-1})^T]K_k^T \\ &= \hat{P}_{k\vert k-1} - 2K_kC\hat{P}_{k\vert k-1} + K_k(C\hat{P}_{k\vert k-1}C^T+R)K_k^T \\ \\ \frac{\partial\text{tr}(\hat{P}_{k\vert k})}{\partial K_k} &= -2C\hat{P}_{k\vert k-1} + 2K_k(C\hat{P}_{k\vert k-1}C^T+R) = 0 \\ K_k &= \hat{P}_{k\vert k-1}C^T(C\hat{P}_{k\vert k-1}C^T+R)^{-1} \end{aligned} \]

Probabilistic Perspective (Bayesian Estimation)

The filter maintains a Gaussian belief state that gets refined through sequential application of Bayes' rule, where prediction corresponds to Chapman-Kolmogorov propagation and update implements Bayesian conditioning.

\[ \begin{aligned} &p(x_{k}|y_{1:k}, u_{1:k})\\=&p(x_{k}|y_k,y_{1:k-1}, u_{1:k})\\=&\frac{p(y_{k}|x_k,y_{1:k-1}, u_{1:k})\cdot p(x_{k}|y_{1:k-1}, u_{1:k})}{p(y_{k}|y_{1:k-1}, u_{1:k})}\\ =&\eta\cdot\underbrace{p(y_k|x_k)}_\text{observation model}\cdot\underbrace{p(x_{k}|y_{1:k-1}, u_{1:k})}_\text{prior belief} \\ =&\eta\cdot p(y_k|x_k)\cdot\int p(x_{k},x_{k-1}|y_{1:k-1}, u_{1:k})\ {\rm d}x_{k-1} \\ =&\eta\cdot p(y_k|x_k)\cdot\int p(x_{k}|x_{k-1},y_{1:k-1},u_{1:k})\cdot p(x_{k-1}|y_{1:k-1},u_{1:k})\ {\rm d}x_{k-1} \\ =&\eta\cdot p(y_k|x_k)\cdot\int\underbrace{p(x_{k}|x_{k-1},u_{k})}_\text{motion model}\cdot\underbrace{p(x_{k-1}|y_{1:k-1},u_{1:k-1})}_\text{previous belief}\ {\rm d}x_{k-1} \\ =&\eta\cdot\mathcal N(y_k;H_kx_k,R_k)\cdot\int\mathcal{N}(x_k;Ax_{k-1}+Bu_k,Q_k)\cdot\mathcal{N}(x_{k-1};\hat{x}_{k-1},\hat{P}_{k-1})\ {\rm d}x_{k-1} \\ =&\eta\cdot\mathcal N(y_k;H_kx_k,R_k)\cdot\mathcal{N}(x_k;\hat{x}_{k\vert k-1},\hat{P}_{k\vert k-1}) \\ \propto&\ \mathcal{N}(x_k;\hat{x}_k,\hat{P}_k) \end{aligned} \]

Applying Bayesian Rule and Markov Assumptions to \(p(x_k|y_{1:k},u_{1:k})\), then the time update and the measurement update becomes very explicit.

Optimization Perspective (MAP Estimation)

The Kalman Filter solves a weighted least-squares problem where the optimal state estimate minimizes a cost function balancing prediction fidelity against measurement consistency, with covariance matrices acting as natural weighting matrices.

By Bayesian Rule, we know that

\[ p(x_{k}|y_{1:k}, u_{1:k})\propto\underbrace{p(y_k|x_k)}_\text{measurement}\cdot\underbrace{p(x_{k}|y_{1:k-1}, u_{1:k})}_\text{prior} \\ \]

To maximize the posterior probability, it is equivalent to minimizing its negative logarithmic posterior.

\[ -\ln{p(x_{k}|y_{1:k}, u_{1:k})}=-\ln{p(y_k|x_k)}-\ln{p(x_{k}|y_{1:k-1}, u_{1:k})}+C \]

Applying Gaussian probability distribution

\[ J_{x_k}=\frac12||z_k-Cx_k||_{R_k^{-1}}^2+\frac12||{x_k-\hat{x}_{k\vert k-1}}||_{\hat{P}_{k\vert k-1}}^2+C \]

\[ \hat{x}_{k\vert k-1}=\arg\min_{x_k}\left(||{x_k-\hat{x}_{k\vert k-1}}||_{\hat{P}_{k\vert k-1}}^2+|z_k-Cx_k||_{R_k^{-1}}^2\right) \]

Prior is given by

\[ \begin{aligned} \hat{x}_{k\vert k-1} &= \mathbb E[x_k|y_{1:k-1}, u_{1:k}] \\ \hat{P}_{k\vert k-1} &= \text{Cov}(x_k|y_{1:k-1}, u_{1:k}) \end{aligned} \]

Kalman Filter in 3 Ways

Takeaway Notes

Assumptions

Time Update

Measurement Update

Geometric Perspective (LMMSE Estimation)

Time Update

Measurement Update

Probabilistic Perspective (Bayesian Estimation)

Optimization Perspective (MAP Estimation)

Comments