Selected Topics on Convexity

book Public Date: 2020-10-01 18:17:45.994530

update Update Date: 2020-10-01 18:26:56.209455

We are going to have a look at some advanced topics Optimality Conditions for Unconstrained Optimization Problems

on convexity, gradient descent and other related stuffs.

Additional Mathematics Background

Fréchet Differentiability

Fréchet Differentiation provides a way to extend Differentiability to general vector spaces.

A mapping F:RnRmF : \mathbb R^n \to \mathbb R^m is said to be Fréchet Differentiable at xx if there is a linear function L:RnRm\mathcal L: \mathbb R^n \to \mathbb R^m such that

limh0F(x+h)F(x)L(h)h=0 \lim_{h \to 0} \frac{\left|F(x + h) - F(x) - L(h)\right|}{| h |} = 0

The function L\mathcal L is the derivative of ff at xx and it holds that

L(h)=Df(x)h \mathcal L(h) = Df(x) \cdot h

where Df(x)Rm×nDf(x) \in \mathbb R^{m \times n} is the Jacobian-Matrix.

Optimality Conditions for Unconstrained Optimization Problems

Basic knowledge on FONC, SONC, SOSC has already been covered in Optimization Learning Notes 5. I may only provide more proofs and more remarks for them.

Quadratic Growth Condition

SOSC turns out to have a equivalent form.

Let f:XRf : X \to \mathbb R be twice continuously differentiable on the open set XRnX \subset \mathbb R^n. Then, the SOSC conditions for the local optimum xx^\ast are equivalent to

α,ϵ,s.t.f(x)f(x)α2xx2,xBϵ(x)X. \exists \alpha, \epsilon, s.t. f(x) - f(x^\ast) \ge \frac{\alpha}{2} |x - x^\ast|^2, \forall x \in B_\epsilon(x^\ast) \cap X.

Notice in the proof of SOSC, we have already shown that

f(x)f(x)λmax(2f(x))4xx2 f(x) - f(x^\ast) \ge \frac{\lambda_\text{max} (\nabla^2 f(x^\ast))}{4} | x - x^\ast|^2

Now we proof the other direction.

Since QGC implies FONC, we have f(x)=0\nabla f(x^\ast) = 0.

By Taylor Expansion,

f(x)f(x)=f(x)(xx)+12(xx)2f(x)(xx)+o(xx2) \begin{aligned} f(x) - f(x^\ast) = \nabla f(x^\ast)^\top (x - x^\ast) + \frac{1}{2} (x - x^\ast)^\top\nabla^2f(x^\ast) (x - x^\ast) + o(| x - x^\ast|^2) \end{aligned}

Hence,

(xx)2f(x)(xx)=2[f(x)f(x)]2o(xx2)2xx2(αo(xx2)xx2) \begin{aligned} (x - x^\ast)^\top\nabla^2f(x^\ast) (x - x^\ast) &= 2[f(x) - f(x^\ast)] - 2o(| x - x^\ast|^2)\ & \ge 2| x - x^\ast|^2\left(\alpha - \frac{o(| x - x^\ast|^2)}{| x - x^\ast|^2} \right) \end{aligned}

Therefore,

limxx0(xx)2f(x)(xx)0 \lim_{| x - x^\ast|\to 0} (x - x^\ast)^\top\nabla^2f(x^\ast)(x - x^\ast) \ge 0

Since XX is open, xxx - x^\ast can be in arbitrary direction, we can see that 2f(x)\nabla^2 f(x^\ast) must be PSD.

Convexity

Extended Definitions for Convexity

Beside the normal definitions of Convex Functions in Optimization Learning Notes 6, we now introduce two more constrained convexity definitions:

Let f:XRf: X \to \mathbb R be given and let XRnX \subset \mathbb R^n be a convex set. The function ff is

  • strictly convex if for all x,yXx, y \in X with xyx \ne y and λ(0,1)\lambda \in (0, 1) we have

f(λx+(1λ)y)<λf(x)+(1λ)f(y) f(\lambda x + (1 - \lambda)y) \lt \lambda f(x) + (1 - \lambda) f(y)

  • strongly convex (with parameter μ\mu) is there exists μ>0\mu \gt 0 such that for all x,yXx, y \in X and all λ[0,1]\lambda \in [0, 1], we have

f(λx+(1λ)y)+μ2λ(1λ)yx2λf(x)+(1λ)f(y) f(\lambda x + (1 - \lambda)y) + \frac{\mu}{2}\lambda(1-\lambda)|y - x|^2 \le \lambda f(x) + (1 - \lambda) f(y)

First Order Characterization

Let f:XRf : X \to \mathbb R be continuously differentiable on an open set that contains the convex set XRnX \subset \mathbb R^n. Then, it follows

  1. The function ff is convex if and only if f(y)f(x)f(x)(yx),x,yX. f(y) - f(x) \ge \nabla f(x)^\top (y - x), \forall x, y \in X.

  2. The mapping ff is strictly convex if and only if f(y)f(x)>f(x)(yx),x,yX,xy f(y) - f(x) \gt \nabla f(x)^\top (y - x), \forall x, y \in X , x \ne y

  3. The function is strongly convex (with parameter μ\mu) if and only if there exists μ>0\mu \gt 0 such that f(y)f(x)f(x)(yx)+μ2yx2,x,yX f(y) - f(x) \ge \nabla f(x)^\top (y - x) + \frac{\mu}{2} | y - x|^2, \forall x, y \in X

Proof of 1

From the definitions, we have

f(y)f(x)f(x+λ(yx))f(x)λ f(y) - f(x) \ge \frac{f(x + \lambda(y - x)) - f(x)}{\lambda}

Taking λ0\lambda \to 0, we get

f(y)f(x)f(x)(yx),x,yX. f(y) - f(x) \ge \nabla f(x)^\top (y - x), \forall x, y \in X.

Denoting xλ=(1λ)x+λyx_\lambda = (1 - \lambda)x + \lambda y, x,yX,λ[0,1]\forall x, y \in X, \lambda \in [0, 1]

f(x)f(xλ)f(xλ)(xxλ)=λf(xλ)(xy)   (1)f(y)f(xλ)f(xλ)(yxλ)=(1λ)f(xλ)(yx)   (2) f(x) - f(x_\lambda) \ge \nabla f(x_\lambda)^\top (x - x_\lambda) = \lambda\nabla f(x_\lambda)^\top (x - y) ~~~(1) \ f(y) - f(x_\lambda) \ge \nabla f(x_\lambda)^\top (y - x_\lambda) = (1 - \lambda)\nabla f(x_\lambda)^\top (y - x) ~~~(2)

(1λ)(1)+λ(2)(1 - \lambda)(1) + \lambda(2), we get

0(1λ)(f(x)f(xλ))+λ(f(y)f(xλ)) 0 \le (1 - \lambda)(f(x) - f(x_\lambda)) + \lambda(f(y) - f(x_\lambda))

This implies (1λ)f(x)+λf(y)f(xλ) (1-\lambda)f(x) + \lambda f(y) \ge f(x_\lambda)

Proof of 2

First, suppose ff is strictly convex, by letting z=12(x+y),x,yX,xyz = \frac{1}{2}(x + y), \forall x, y \in X, x \ne y we have

f(x)<12f(x)+12f(y)f(z)f(x)<12(f(y)f(x)) f(x) \lt \frac{1}{2} f(x) + \frac{1}{2}f(y) \Rightarrow f(z) - f(x) \lt \frac{1}{2}(f(y) - f(x))

Using 1, it holds that

f(z)f(x)f(x)(zx)=12f(x)(yx) f(z) - f(x) \ge \nabla f(x)^\top(z - x) = \frac{1}{2}\nabla f(x)^\top (y - x)

This yields,

f(x)(yx)2(f(z)f(x))<f(y)f(x) \nabla f(x)^\top (y - x) \le 2(f(z) - f(x)) \lt f(y) - f(x)

The other half can be proved in the same way as 11.

Proof of 3

Following the same way of 11,

f(x)(yx)=limλ0f(xλ)f(x)λlimλ0(1λ)f(x)+λf(y)μ2λ(1λ)yx2f(x)λ=f(y)f(x)μ2yx2 \begin{aligned} \nabla f(x)^\top(y - x) &= \lim_{\lambda \downarrow 0} \frac{f(x_\lambda) - f(x)}{\lambda} \ &\le \lim_{\lambda \downarrow 0} \frac{(1 - \lambda)f(x) + \lambda f(y) - \frac{\mu}{2}\lambda(1 - \lambda)| y - x|^2 - f(x)}{\lambda} \ &= f(y ) - f(x) - \frac{\mu}{2} | y - x|^2 \end{aligned}

For the other direction, consider

xxλ=λyxyxλ=(1λ)yx | x - x_\lambda| = \lambda | y - x| \ | y - x_\lambda| = (1 - \lambda) | y - x|

We then obtain

(1λ)f(x)+λf(y)f(xλ)=(1λ)(f(x)f(xλ))+λ(f(y)f(xλ))(1λ)[f(xλ)(xxλ)μ2xxλ2]+λ[f(xλ)(yxλ)μ2yxλ2]=f(xλ)[(1λ)x+λyxλ]+μ2[λ2(1λ)+λ(1λ)2]yx2=μ2λ(1λ)yx2 \begin{aligned} (1 - \lambda)f(x) + \lambda f(y) - f(x_\lambda) &= (1 - \lambda)(f(x) - f(x_\lambda)) + \lambda(f(y) - f(x_\lambda))\ &\ge (1 - \lambda)[\nabla f(x_\lambda)^\top(x - x_\lambda) - \frac{\mu}{2}| x - x_\lambda|^2] + \lambda [\nabla f(x_\lambda)^\top(y - x_\lambda) - \frac{\mu}{2}|y - x_\lambda|^2]\ & = \nabla f(x_\lambda)^\top [(1 - \lambda) x + \lambda y - x_\lambda] + \frac{\mu}{2}[\lambda^2(1 - \lambda) + \lambda(1 - \lambda)^2]| y - x|^2\ & = \frac{\mu}{2}\lambda(1-\lambda)| y - x|^2 \end{aligned}

Second Order Characterization

Let f:XRf : X \to \mathbb R be twice continuously differentiable on an open set and convex set XRnX \subset \mathbb R^n.

  1. The function ff is convex if and only if the Hessian 2f(x)\nabla^2f(x) is positive semidefinite for all xXx \in X. h2f(x)h0,hRn,xX. h^\top \nabla^2f(x)h \ge 0, \forall h \in \mathbb R^n, \forall x \in X.

  2. If the Hessian 2f(x)\nabla^2f(x) is PD for all xXx \in X, i.e., if we have h2f(x)h>0,hRn</mi>{0},xX h^\top \nabla^2f(x) h \gt 0, \forall h \in \mathbb R^n \backslash {0}, \forall x \in X then ff is strictly convex.

  3. The mapping ff is strongly convex (with parameter μ\mu) iff 2f(x)\nabla^2 f(x) is uniformly positive definite (with parameter μ\mu), i.e., there is μ>0\mu \gt 0 s.t. h2f(x)hμh2,hRn,xX. h^\top \nabla^2f(x)h \ge \mu | h|^2, \forall h \in \mathbb R^n, \forall x \in X.

Proof of 1

Suppose the Hessian is PSD. By Second Order MVT, we can find t[0,1]t \in [0, 1] s.t.

f(y)f(x)=f(x)(yx)+12(yx)2f(x+t(yx))(yx)f(x)(yx) f(y) - f(x) = \nabla f(x)^\top (y - x) + \frac{1}{2} (y - x)^\top \nabla^2f(x + t(y - x))(y - x) \ge f(x)^\top (y - x)

This gives us the first order characterization.

To prove the other direction, consider x+th,hRnx + th, \forall h \in \mathbb R^n. Since XX is open, we have find an ϵ\epsilon, s.t.

x+thX,t[0,ϵ] x + th \in X, \forall t \in [0, \epsilon]

Now consider the following Taylor Expansion:

f(x+th)=f(x)+tf(x)h+t22h2f(x)h+o(t2) f(x + th) = f(x) + t\nabla f(x)^\top h + \frac{t^2}{2}h^\top \nabla^2f(x)h + o(t^2)

for tt sufficiently small (t0)(t \to 0). Since ff is convex, we can use the first order characterization:

t22h2f(x)h+o(t2)0,(t0) \frac{t^2}{2}h^\top \nabla^2f(x) h + o(t^2) \ge 0, (t \to 0)

Hence,

h2f(x)h2o(t2)t2,(t0) h^\top \nabla^2f(x) h \ge - 2\frac{o(t^2)}{t^2}, (t \to 0)

We can infer that 2f(x)\nabla^2f(x) is PSD.

Proof of 2

The same approach as the proof of 1.

Proof of 3

Similarly to the proof of 1, we use MVT to get

f(y)f(x)=f(x)(yx)+12(yx)2f(x+t(yx))(yx)f(x)(yx)+μ2yx2 \begin{aligned} f(y) - f(x) &= \nabla f(x)^\top (y - x) + \frac{1}{2} (y - x)^\top \nabla^2f(x + t(y - x))(y - x) \ &\ge \nabla f(x)^\top (y - x) + \frac{\mu}{2}| y - x |^2 \end{aligned}

This already gives us the first order characterization.

Conversely, we start with the first order characterization:

f(x+th)f(x)tf(x)h+μt22h2 f(x + th) - f(x) \ge t \nabla f(x)^\top h + \frac{\mu t^2}{2} |h|^2

Reuse the Taylor Expansion:

f(x+th)f(x)=tf(x)h+t22h2f(x)h+o(t2) f(x + th) - f(x) = t\nabla f(x)^\top h + \frac{t^2}{2}h^\top \nabla^2f(x)h + o(t^2)

We now get

h2f(x)h2o(t2)t2+μh2,(t0) h^\top \nabla^2f(x) h \ge - 2\frac{o(t^2)}{t^2} + \mu | h|^2, (t \to 0)

This finishes the proof.

Convex Optimization Problems

In Optimization Learning Notes 6, we already shows that for a typical convex minimization problem, any local minimum must be a global minimum. There are twos things remains here:

  • If ff is strictly convex, then the minimization problem possesses at most one local minimizer which (if it exists) is also the strict global minimum of ff on XX.

  • If the optimization set XX is open and ff is continuously differentiable and xx^\ast is a stationary point of ff. Then, xx^\ast is a global minimizer of ff on XX. (we provides a proof here)

Proof of the first part

Suppose ff possesses two local minimizer xyx^\ast \ne y ^\ast. We then know that they are both global solutions:

f(x)=f(y) f(x^\ast) = f(y^\ast)

Consider

z=12x+12y z^\ast = \frac{1}{2}x^\ast + \frac{1}{2}y^\ast

We have

f(z)<12f(x)+12f(y)=f(x) f(z^\ast) \lt \frac{1}{2}f(x^\ast) + \frac{1}{2}f(y^\ast) = f(x^\ast)

This is a contradiction.

Proof of the second part

Just use the first order characterization:

f(y)f(x)+f(x)(yx)=f(x),yX f(y) \ge f(x^\ast) + \nabla f(x^\ast)(y - x^\ast) = f(x^\ast), \forall y \in X

Hence, xx^\ast is a global solution.

Some Additional Properties

Convexity of f(x)μ2x2f(x) - \frac{\mu}{2}| x|^2

Consider

g(x)=f(x)μ2x2 g(x) = f(x) - \frac{\mu}{2}| x|^2

We show that f(x)f(x) is strongly convex with parameter μ\mu iff g(x)g(x) is convex.

First suppose f(x)f(x) is strongly convex, the we have

g(λa+(1λ)b)=f(λa+(1λ)b)μ2λa+(1λ)b2λf(a)+(1λ)f(b)λ(1λ)2μab2μ2λ2a2μ2(1λ)2b2μ(1λ)λab=λf(a)+(1λ)f(b)μ2[λ(1λ)(a22ab+b2)+λ2a2+(1λ)2b2+2(1λ)λab]=λf(a)+(1λ)f(b)μ2[λ(1λ)(a2+b2)+λ2a2+(1λ)2b2]=λf(a)+(1λ)f(b)μ2[λa2+(1λ)b2]=λg(a)+(1λ)g(b) \begin{aligned} g(\lambda a + (1 - \lambda)b) &= f(\lambda a + (1 - \lambda)b) - \frac{\mu}{2} |\lambda a + (1 - \lambda) b|^2 \ &\le \lambda f(a) + (1 - \lambda )f(b) - \frac{\lambda (1 - \lambda)}{2} \mu |a - b |^2 - \frac{\mu}{2}\lambda^2| a |^2 - \frac{\mu}{2}(1 - \lambda)^2| b |^2 - \mu(1-\lambda)\lambda|a | |b|\ &= \lambda f(a) + (1 - \lambda )f(b ) - \frac{\mu}{2} \left[\lambda(1 - \lambda) (| a|^2 - 2| a|| b| + | b|^2) + \lambda^2| a |^2 + (1 - \lambda)^2| b |^2 + 2(1-\lambda)\lambda|a | |b|\right] \ &=\lambda f(a) + (1 - \lambda )f(b ) - \frac{\mu}{2} \left[\lambda(1 - \lambda) (| a|^2 + | b|^2) + \lambda^2| a |^2 + (1 - \lambda)^2| b |^2\right] \ &=\lambda f(a) + (1 - \lambda )f(b ) - \frac{\mu}{2} \left[\lambda | a|^2 + (1 - \lambda) | b|^2\right] \ &= \lambda g(a) + (1 - \lambda)g(b) \end{aligned}

Therefore, g(x)g(x) is convex.

Conversely, if g(x)g(x) is convex,

λf(a)+(1λ)f(b)=λ[g(a)+μ2a2]+(1λ)[g(b)+μ2b2]=λg(a)+(1λ)g(b)+μ2[λa2+(1λ)b2]g(λa+(1λ)b)+μ2[λa2+(1λ)b2]=f(λa+(1λ)b)μ2λa+(1λ)b2+μ2[λa2+(1λ)b2] \begin{aligned} \lambda f(a) + (1 - \lambda) f(b)& = \lambda [g(a) + \frac{\mu}{2} | a|^2 ] + (1 - \lambda)[g(b) + \frac{\mu}{2}| b|^2] \ & = \lambda g(a) + (1 - \lambda) g(b) + \frac{\mu}{2} [\lambda |a |^2 + (1 - \lambda) | b|^2] \ & \ge g(\lambda a +(1 - \lambda)b) + \frac{\mu}{2} [\lambda |a |^2 + (1 - \lambda) | b|^2] \ & = f(\lambda a + (1 - \lambda)b) - \frac{\mu}{2} | \lambda a + (1 - \lambda)b|^2 + \frac{\mu}{2} [\lambda |a |^2 + (1 - \lambda) | b|^2] \end{aligned}

Then, by a similar approach in the above proof, we see

μ2λa+(1λ)b2+μ2[λa2+(1λ)b2]=0 -\frac{\mu}{2} | \lambda a + (1 - \lambda)b|^2 + \frac{\mu}{2} [\lambda |a |^2 + (1 - \lambda) | b|^2] = 0

We finishes the proof.

Convexity of Epigraph

The epigraph of a function f:RnRf : \mathbb R^n \to \mathbb R is defined as

epi f:={(x,t):xRn,μR,μf(x)}Rn+1 \text{epi } f := {(x, t) : x \in \mathbb R^n, \mu \in \mathbb R, \mu \ge f(x)} \subseteq \mathbb R^{n + 1}

ff is convex iff its epigraph is convex. Consider we have

(x1,t1),(x2,t2)epi f (x_1, t_1), (x_2, t_2) \in \text{epi } f

Define

(x~,y~)=λ(x1,t1)+(1λ)(x2,t2) (\tilde x, \tilde y) = \lambda (x_1, t_1) + (1 - \lambda)(x_2, t_2)

We then have

y~=λy1+(1λ)y2λf(y1)+(1λ)f(y2)f(λy1+(1λ)y2)=f(x~) \begin{aligned} \tilde y &= \lambda y_1 + (1 - \lambda)y_2 \ &\ge \lambda f(y_1) + (1 - \lambda)f(y_2) \ &\ge f(\lambda y_1 + (1 - \lambda)y_2) \ & = f(\tilde x) \end{aligned}

Therefore, (x~,y~)(\tilde x, \tilde y) is in the epigraph and the epigraph is convex.

To proof the other side, we know that

λ(x1,f(x1))+(1λ)(x2,f(x2)) \lambda (x_1, f(x_1)) + (1 - \lambda)(x_2, f(x_2))

is in the epigraph. By the convexity of the epigraph,

f(λx1+(1λ)x2)λf(x1)+(1λ)f(x2) f(\lambda x_1 + (1 - \lambda)x_2) \le \lambda f(x_1) + (1 - \lambda) f(x_2)

Therefore, the function also be convex.

Existence of Directional Derivative

Sometimes, the objective function is not differentiable. However, if it is convex, we can see that the directional derivative always exists.

Suppose f:RnRf : \mathbb R^n \to \mathbb R is convex. we then have

  1. The difference quotient f(x+td)f(x)t \frac{f(x + td) - f(x)}{t} is a non-decreasing function of tt on (0,+)(0, +\infty).

  2. For every x,dRnx, d \in \mathbb R^n the the directional derivative f(x;d)f'(x; d) always exists and is given by f(x;d)=inft>0f(x+td)f(x)t f'(x; d) = \inf_{t \gt 0} \frac{f(x + td) - f(x)}{t}

Proof of 1

Let x,dRnx, d \in \mathbb R^n and let 0<t1<t20 < t_1 < t_2. Then

f(x+t1d)=f(x+(t1t2)t2d)=f[(1t1t2)x+t1t2(x+t2d)](1t1t2)f(x)+t1t2f(x+t2d) \begin{aligned} f(x + t_1 d) &= f\left(x + \left(\frac{t_1}{t_2}\right)t_2d\right) \ &= f\left[\left(1 - \frac{t_1}{t_2}\right)x + \frac{t_1}{t_2} (x + t_2 d)\right] \ &\le \left(1 - \frac{t_1}{t_2}\right) f(x) + \frac{t_1}{t_2}f(x + t_2 d) \end{aligned}

Hence,

f(x+t1d)f(x)t1f(x+t2d)f(x)t2 \frac{f(x + t_1 d) - f(x)}{t_1} \le \frac{f(x + t_2 d) - f(x)}{t_2}

Proof of 2

The definition of the directional derivative is

f(x;d):=limt0f(x+td)f(x)t f'(x; d) := \lim_{t \to 0} \frac{f(x + td) - f(x)}{t}

Since

f(x+td)f(x)t \frac{f(x + td) - f(x)}{t}

is non-decreasing.

Then the limit is necessarily given by the infimum of these ratios.

Local Lipschitz Continuity

Let f:Rn(,]f: \mathbb R^n \to (-\infty, \infty] be a convex function such that the interior of its domain is not empty. Then, ff is locally Lipschitz continuous on in the interior of its domain.

L,f(x)f(y)Lxy,x,yK,KInt(Dom f) \exists L, | f(x) - f(y)| \le L |x - y|, \forall x, y \in K, K \subseteq \text{Int}(\text{Dom } f)

[Proof for Local Lipschitz Continuity may be provided later]

Raw Content

bookmarks Tags


convexity math optimization

Submit New Comment

question_answer Comments


No Comment Yet~