bilibili
optimization
PHR Conic Augmented Lagrangian Method
Slide : Starting from the penalty method, we extend to the augmented Lagrangian method for improved stability. By introducing \(s\) , symmetric cone constraints are integrated, forming a unified framework for solving constrained optimization problems iteratively. Inspired by Dr. Zhepei Wang's Lecture "Numerical Optimization for Robotics".
Introduction
Penalty Method
Consider the constrained optimization problem:
\[
\min_xf(x)\quad\text{s.t.}\quad h(x)=0
\]
Penalty method solves a series of unconstrained problems:
\[
Q_\rho(x) = f(x) + \frac\rho2||h(x)||^2
\]
Challenge:
Requires \(\rho\to\infty\) for exact solution, causing ill-conditioned Hessian.
Finite \(\rho\) leads to constraint violation \(h(x)\neq 0\) .
Lagrangian Relaxation
The Lagrangian is defined as:
\[
\mathcal{L}(x,\lambda) = f(x) + \lambda^\top h(x)
\]
At the optimal solution \(x^*\) , there exists \(\lambda^*\) such that \(\nabla_x \mathcal{L}(x^*,\lambda^*) = 0\) .
Uzawa's method iteratively updates \(x\) and \(\lambda\) :
\[
\begin{cases}
x^{k+1} = \arg\min_x \mathcal{L}(x,\lambda^k) \\
\lambda^{k+1} = \lambda^k + \alpha_k h(x^{k+1})
\end{cases}
\]
where \(\alpha_k > 0\) is a step size.
However, when \(\lambda\) is fixed and one attempts to minimize \(\mathcal{L}(x,\lambda)\) :
\(\min_x\mathcal{L}(x,\lambda)\) can be non-smooth even for smooth \(f\) and \(h\) .
\(\min_x \mathcal{L}(x,\lambda)\) may be unbounded or have no finite solution.
Equality Constraint
PHR Augmented Lagrangian Method
Consider the optimization problem with a penalty on the deviation from a prior \(\bar{\lambda}\) :
\[
\min_x\max_\lambda f(x) + \lambda^\top h(x) - \frac1{2\rho}||\lambda-\bar{\lambda}||^2
\]
The inner problem:
\[
\nabla_\lambda = h(x) - \frac1\rho(\lambda-\bar\lambda)\quad\Rightarrow\quad\lambda^*(\bar\lambda)=\bar\lambda+\rho h(x)
\]
The outer problem:
\[
\begin{aligned}
& \min_x\max_\lambda f(x)+\lambda^\top h(x) - \frac1{2\rho}||\lambda-\bar\lambda||^2 \\
=& \min_x f(x)+[\lambda^*(\bar\lambda)]^\top h(x) - \frac1{2\rho}||\lambda^*(\bar\lambda)-\bar\lambda||^2 \\
=& \min_x f(x)+[\bar\lambda+\rho h(x)]^\top h(x) - \frac\rho2||h(x)||^2 \\
=& \min_x f(x)+\bar\lambda^\top h(x) + \frac\rho2||h(x)||^2
\end{aligned}
\]
PHR Augmented Lagrangian Method (cont.)
To increase precision:
Reduce the penalty weight \(1/\rho\)
Update the prior multiplier \(\bar\lambda\leftarrow\lambda^*(\bar\lambda)\)
Uzawa's method for the augmented Lagrangian function is:
\(x\leftarrow\arg\min_x f(x)+\bar\lambda^\top h(x)+\frac\rho2||h(x)||^2\)
\(\bar\lambda\leftarrow\bar\lambda+\rho h(x)\)
Penalty Method Perspective
The corresponding primal problem of the augumented Lagrangian Function is obviously:
\[
\begin{aligned}
\min_x\ & f(x)+\frac\rho2||h(x)||^2 \\
\text{s.t. } & h(x) = 0
\end{aligned}
\]
Advantages:
Even without \(\rho \to \infty\) , the constraints can be exactly satisfied in the limit through multiplier updates.
For large \(\rho\) , the penalty term \(\frac{\rho}{2}||h(x)||^2\) dominates, ensuring \(\min_x \mathcal{L}_\rho(x, \lambda)\) has a local solution.
The augmented dual function \(q_\rho(\lambda)\) is smooth in proper conditions, with \(\nabla q_\rho(\lambda) \approx h(x(\lambda))\) .
Practical PHR-ALM
In practical, we use its equivalent form:
\[
\mathcal{L}_\rho(x,\lambda) = f(x) + \frac\rho2\left\lVert h(x)+\frac\lambda\rho\right\rVert^2 - \underbrace{\frac1{2\rho}||\lambda||^2}_{x\text{-independent}}
\]
The KKT solution can be solved via:
\[
\begin{cases}
x^{k+1} = \arg\min_x\mathcal{L}_{\rho^k}(x,\lambda^k) \\
\lambda^{k+1} = \lambda^k + \rho^k\;h(x^{k+1}) \\
\rho^{k+1} = \min[(1+\gamma)\rho^k, \rho_\max]
\end{cases}
\]
where \(\rho^k\) can be any nondecreasing positive sequence.
Inequality Constraint
Slack Variables Relaxation
Consider the optimization problem with inequality constraints:
\[
\min_xf(x)\quad\text{s.t.}\quad g(x)\le0
\]
We use the equivalent formulation using slack variables:
\[
\min_{x,s} f(x)\quad\text{s.t.}\quad g(x)+[s]^2=0
\]
where \([\cdot]^2\) means element-wise squaring.
We can directly form Lagrangian like equality-constrained case
\[
\begin{aligned}
&\min_{x,s}\left\{f(x)+\frac\rho2\left\lVert g(x)+[s]^2+\frac\lambda\rho\right\rVert^2\right\}\\=&\min_xf(x)+\min_x\min_s\frac\rho2\left\lVert g(x)+[s]^2+\frac\lambda\rho\right\rVert^2\\=&\min_x f(x)+\frac\rho2\left\lVert\max[g(x)+\frac\lambda\rho,0]\right\rVert^2
\end{aligned}
\]
Summing over all components gives the final form:
\[
\mathcal{L}_\rho(x,\mu) = f(x) + \frac\rho2\left\lVert\max[g(x)+\frac\mu\rho,0]\right\rVert^2 - \underbrace{\frac1{2\rho}||\mu||^2}_{x\text{-independent}}
\]
For the dual update, from the optimality condition:
\[
\begin{aligned}
\mu^{k+1} &= \mu^k + \rho\left(g(x^{k+1})+[s^{k+1}]^2\right) \\
&= \max\left[\mu^k+\rho g(x^{k+1}), 0\right]
\end{aligned}
\]
Summary
PHR Augmented Lagrangian Method for General Nonconvex cases:
\[
\begin{aligned}
\min_x\; & f(x) \\
\text{s.t. } & h(x) = 0 \\
& g(x) \le 0
\end{aligned}
\]
Its PHR Augmented Lagrangian is defined as
\[
\mathcal{L}_\rho=f(x)+\frac\rho2\left\lVert h(x)+\frac\lambda\rho\right\rVert^2+\frac\rho2\left\lVert\max[g(x)+\frac\mu\rho,0]\right\rVert^2-\frac1{2\rho}\left\{||\lambda||^2+||\mu||^2\right\}
\]
The PHR-ALM is simply repeating the primal descent and dual ascent iterations:
\[
\begin{cases}
x^{k+1} = \arg\min_x\mathcal{L}_{\rho^k}(x,\lambda^k,\mu^k) \\
\lambda^{k+1} = \lambda^k + \rho^k\;h(x^{k+1}) \\
\mu^{k+1} = \max[\mu^k+\rho^k g(x^{k+1}), 0] \\
\rho^{k+1} = \min[(1+\gamma)\rho^k, \rho_\max]
\end{cases}
\]
Courtesy: Z. Wang
Symmetric Cone Constraint
Extension to Symmetric Cone Constraints
Consider the symmetric cone constrained optimization problem:
\[
\begin{aligned}
\min_x\; & f(x) \\
\text{s.t. } & h(x) = 0 \\
& g(x) \in \mathcal{K}
\end{aligned}
\]
Second Order Cone
Positive Definite Cone
Generalized Inequality Constraint
For the symmetric cone constraint \(x \in \mathcal{K}\) , we can equivalently express it as:
\[
g(x) = -x \preceq_{\mathcal{K}} 0
\]
The standard inequality constraint \(g(x)\le0\) corresponds to the nonnegative orthant cone :
\[
\mathcal{K}=\mathbb{R}^n_+=\set{x\in\mathbb{R}^n:x_i\ge0,\;i=1,\dots,n}
\]
t's projection operator is exactly element-wise max function:
\[
\Pi_{\mathbb{R}^n_+}(v) = \max[v,0],\quad(\mathbb{R}^n_+)^*=\mathbb{R}^n_+
\]
Slack Variables Relaxation
Consider the optimization problem with inequality constraints:
\[
\min_xf(x)\quad\text{s.t.}\quad g(x)\in\mathcal{K}
\]
By Euclidean Jordan algebra, the conit program is equivalent to
\[
\min_{x,s} f(x)\quad\text{s.t.}\quad g(x)=s\circ s
\]
We can directly form Lagrangian like equality-constrained case
\[
\begin{aligned}
&\min_{x,s}\left\{f(x)+\frac\rho2\left\lVert g(x)-s\circ s+\frac\lambda\rho\right\rVert^2\right\}\\=&\min_xf(x)+\min_x\min_s\frac\rho2\left\lVert g(x)-s\circ s+\frac\lambda\rho\right\rVert^2\\=&\min_x f(x)+\frac\rho2\left\lVert\Pi_{\mathcal{K}}\left(-g(x)-\frac\lambda\rho\right)\right\rVert^2
\end{aligned}
\]
Let \(\mu=-\lambda\) , we get the final form:
\[
\mathcal{L}_\rho(x,\mu) = f(x) + \frac\rho2\left\lVert\Pi_{\mathcal{K^*}}\left(-g(x)+\frac\mu\rho\right)\right\rVert^2 - \underbrace{\frac1{2\rho}||\mu||^2}_{x\text{-independent}}
\]
For the dual update, from the optimality condition:
\[
\begin{aligned}
\mu^{k+1} &= \mu^k + \rho\left[g(x^{k+1})-s^{k+1}\circ s^{k+1}\right] \\
&= \Pi_{\mathcal{K^*}}[\mu^k-\rho\cdot g(x^{k+1})]\end{aligned}
\]
where \(\mathcal{K}^*\) is the dual cone of \(\mathcal{K}\) .
Summary
PHR Augmented Lagrangian Method for General Nonconvex cases:
\[
\begin{aligned}
\min_x\; & f(x) \\
\text{s.t. } & h(x) = 0 \\
& g(x) \in \mathcal{K}
\end{aligned}
\]
Its PHR Augmented Lagrangian is defined as
\[
\mathcal{L}_\rho=f(x)+\frac\rho2\left\lVert h(x)+\frac\lambda\rho\right\rVert^2+\frac\rho2\left\lVert\Pi_{\mathcal{K}}\left(-g(x)+\frac\mu\rho\right)\right\rVert^2-\frac1{2\rho}\left\{||\lambda||^2+||\mu||^2\right\}
\]
The PHR-ALM is simply repeating the primal descent and dual ascent iterations:
\[
\begin{cases}
x^{k+1} = \arg\min_x\mathcal{L}_{\rho^k}(x,\lambda^k,\mu^k) \\
\lambda^{k+1} = \lambda^k + \rho^k\;h(x^{k+1}) \\
\mu^{k+1} = \Pi_\mathcal{K^*}(\mu^k-\rho^kx^{k+1}) \\
\rho^{k+1} = \min[(1+\gamma)\rho^k, \rho_\max]
\end{cases}
\]