# Selected Topics on Convexity

book Public Date: 2020-10-01 18:17:45.994530

update Update Date: 2020-10-01 18:26:56.209455

We are going to have a look at some advanced topics Optimality Conditions for Unconstrained Optimization Problems

on convexity, gradient descent and other related stuffs.

### Fréchet Differentiability

Fréchet Differentiation provides a way to extend Differentiability to general vector spaces.

A mapping $F : \mathbb R^n \to \mathbb R^m$ is said to be Fréchet Differentiable at $x$ if there is a linear function $\mathcal L: \mathbb R^n \to \mathbb R^m$ such that

$\lim_{h \to 0} \frac{\left|F(x + h) - F(x) - L(h)\right|}{| h |} = 0$

The function $\mathcal L$ is the derivative of $f$ at $x$ and it holds that

$\mathcal L(h) = Df(x) \cdot h$

where $Df(x) \in \mathbb R^{m \times n}$ is the Jacobian-Matrix.

## Optimality Conditions for Unconstrained Optimization Problems

Basic knowledge on FONC, SONC, SOSC has already been covered in Optimization Learning Notes 5. I may only provide more proofs and more remarks for them.

SOSC turns out to have a equivalent form.

Let $f : X \to \mathbb R$ be twice continuously differentiable on the open set $X \subset \mathbb R^n$. Then, the SOSC conditions for the local optimum $x^\ast$ are equivalent to

$\exists \alpha, \epsilon, s.t. f(x) - f(x^\ast) \ge \frac{\alpha}{2} |x - x^\ast|^2, \forall x \in B_\epsilon(x^\ast) \cap X.$

Notice in the proof of SOSC, we have already shown that

$f(x) - f(x^\ast) \ge \frac{\lambda_\text{max} (\nabla^2 f(x^\ast))}{4} | x - x^\ast|^2$

Now we proof the other direction.

Since QGC implies FONC, we have $\nabla f(x^\ast) = 0$.

By Taylor Expansion,

\begin{aligned} f(x) - f(x^\ast) = \nabla f(x^\ast)^\top (x - x^\ast) + \frac{1}{2} (x - x^\ast)^\top\nabla^2f(x^\ast) (x - x^\ast) + o(| x - x^\ast|^2) \end{aligned}

Hence,

\begin{aligned} (x - x^\ast)^\top\nabla^2f(x^\ast) (x - x^\ast) &= 2[f(x) - f(x^\ast)] - 2o(| x - x^\ast|^2)\ & \ge 2| x - x^\ast|^2\left(\alpha - \frac{o(| x - x^\ast|^2)}{| x - x^\ast|^2} \right) \end{aligned}

Therefore,

$\lim_{| x - x^\ast|\to 0} (x - x^\ast)^\top\nabla^2f(x^\ast)(x - x^\ast) \ge 0$

Since $X$ is open, $x - x^\ast$ can be in arbitrary direction, we can see that $\nabla^2 f(x^\ast)$ must be PSD.

## Convexity

### Extended Definitions for Convexity

Beside the normal definitions of Convex Functions in Optimization Learning Notes 6, we now introduce two more constrained convexity definitions:

Let $f: X \to \mathbb R$ be given and let $X \subset \mathbb R^n$ be a convex set. The function $f$ is

• strictly convex if for all $x, y \in X$ with $x \ne y$ and $\lambda \in (0, 1)$ we have

$f(\lambda x + (1 - \lambda)y) \lt \lambda f(x) + (1 - \lambda) f(y)$

• strongly convex (with parameter $\mu$) is there exists $\mu \gt 0$ such that for all $x, y \in X$ and all $\lambda \in [0, 1]$, we have

$f(\lambda x + (1 - \lambda)y) + \frac{\mu}{2}\lambda(1-\lambda)|y - x|^2 \le \lambda f(x) + (1 - \lambda) f(y)$

### First Order Characterization

Let $f : X \to \mathbb R$ be continuously differentiable on an open set that contains the convex set $X \subset \mathbb R^n$. Then, it follows

1. The function $f$ is convex if and only if $f(y) - f(x) \ge \nabla f(x)^\top (y - x), \forall x, y \in X.$

2. The mapping $f$ is strictly convex if and only if $f(y) - f(x) \gt \nabla f(x)^\top (y - x), \forall x, y \in X , x \ne y$

3. The function is strongly convex (with parameter $\mu$) if and only if there exists $\mu \gt 0$ such that $f(y) - f(x) \ge \nabla f(x)^\top (y - x) + \frac{\mu}{2} | y - x|^2, \forall x, y \in X$

#### Proof of 1

From the definitions, we have

$f(y) - f(x) \ge \frac{f(x + \lambda(y - x)) - f(x)}{\lambda}$

Taking $\lambda \to 0$, we get

$f(y) - f(x) \ge \nabla f(x)^\top (y - x), \forall x, y \in X.$

Denoting $x_\lambda = (1 - \lambda)x + \lambda y$, $\forall x, y \in X, \lambda \in [0, 1]$

$f(x) - f(x_\lambda) \ge \nabla f(x_\lambda)^\top (x - x_\lambda) = \lambda\nabla f(x_\lambda)^\top (x - y) ~~~(1) \ f(y) - f(x_\lambda) \ge \nabla f(x_\lambda)^\top (y - x_\lambda) = (1 - \lambda)\nabla f(x_\lambda)^\top (y - x) ~~~(2)$

$(1 - \lambda)(1) + \lambda(2)$, we get

$0 \le (1 - \lambda)(f(x) - f(x_\lambda)) + \lambda(f(y) - f(x_\lambda))$

This implies $(1-\lambda)f(x) + \lambda f(y) \ge f(x_\lambda)$

#### Proof of 2

First, suppose $f$ is strictly convex, by letting $z = \frac{1}{2}(x + y), \forall x, y \in X, x \ne y$ we have

$f(x) \lt \frac{1}{2} f(x) + \frac{1}{2}f(y) \Rightarrow f(z) - f(x) \lt \frac{1}{2}(f(y) - f(x))$

Using 1, it holds that

$f(z) - f(x) \ge \nabla f(x)^\top(z - x) = \frac{1}{2}\nabla f(x)^\top (y - x)$

This yields,

$\nabla f(x)^\top (y - x) \le 2(f(z) - f(x)) \lt f(y) - f(x)$

The other half can be proved in the same way as $1$.

#### Proof of 3

Following the same way of $1$,

\begin{aligned} \nabla f(x)^\top(y - x) &= \lim_{\lambda \downarrow 0} \frac{f(x_\lambda) - f(x)}{\lambda} \ &\le \lim_{\lambda \downarrow 0} \frac{(1 - \lambda)f(x) + \lambda f(y) - \frac{\mu}{2}\lambda(1 - \lambda)| y - x|^2 - f(x)}{\lambda} \ &= f(y ) - f(x) - \frac{\mu}{2} | y - x|^2 \end{aligned}

For the other direction, consider

$| x - x_\lambda| = \lambda | y - x| \ | y - x_\lambda| = (1 - \lambda) | y - x|$

We then obtain

\begin{aligned} (1 - \lambda)f(x) + \lambda f(y) - f(x_\lambda) &= (1 - \lambda)(f(x) - f(x_\lambda)) + \lambda(f(y) - f(x_\lambda))\ &\ge (1 - \lambda)[\nabla f(x_\lambda)^\top(x - x_\lambda) - \frac{\mu}{2}| x - x_\lambda|^2] + \lambda [\nabla f(x_\lambda)^\top(y - x_\lambda) - \frac{\mu}{2}|y - x_\lambda|^2]\ & = \nabla f(x_\lambda)^\top [(1 - \lambda) x + \lambda y - x_\lambda] + \frac{\mu}{2}[\lambda^2(1 - \lambda) + \lambda(1 - \lambda)^2]| y - x|^2\ & = \frac{\mu}{2}\lambda(1-\lambda)| y - x|^2 \end{aligned}

### Second Order Characterization

Let $f : X \to \mathbb R$ be twice continuously differentiable on an open set and convex set $X \subset \mathbb R^n$.

1. The function $f$ is convex if and only if the Hessian $\nabla^2f(x)$ is positive semidefinite for all $x \in X$. $h^\top \nabla^2f(x)h \ge 0, \forall h \in \mathbb R^n, \forall x \in X.$

2. If the Hessian $\nabla^2f(x)$ is PD for all $x \in X$, i.e., if we have $h^\top \nabla^2f(x) h \gt 0, \forall h \in \mathbb R^n \backslash {0}, \forall x \in X$ then $f$ is strictly convex.

3. The mapping $f$ is strongly convex (with parameter $\mu$) iff $\nabla^2 f(x)$ is uniformly positive definite (with parameter $\mu$), i.e., there is $\mu \gt 0$ s.t. $h^\top \nabla^2f(x)h \ge \mu | h|^2, \forall h \in \mathbb R^n, \forall x \in X.$

#### Proof of 1

Suppose the Hessian is PSD. By Second Order MVT, we can find $t \in [0, 1]$ s.t.

$f(y) - f(x) = \nabla f(x)^\top (y - x) + \frac{1}{2} (y - x)^\top \nabla^2f(x + t(y - x))(y - x) \ge f(x)^\top (y - x)$

This gives us the first order characterization.

To prove the other direction, consider $x + th, \forall h \in \mathbb R^n$. Since $X$ is open, we have find an $\epsilon$, s.t.

$x + th \in X, \forall t \in [0, \epsilon]$

Now consider the following Taylor Expansion:

$f(x + th) = f(x) + t\nabla f(x)^\top h + \frac{t^2}{2}h^\top \nabla^2f(x)h + o(t^2)$

for $t$ sufficiently small $(t \to 0)$. Since $f$ is convex, we can use the first order characterization:

$\frac{t^2}{2}h^\top \nabla^2f(x) h + o(t^2) \ge 0, (t \to 0)$

Hence,

$h^\top \nabla^2f(x) h \ge - 2\frac{o(t^2)}{t^2}, (t \to 0)$

We can infer that $\nabla^2f(x)$ is PSD.

#### Proof of 2

The same approach as the proof of 1.

#### Proof of 3

Similarly to the proof of 1, we use MVT to get

\begin{aligned} f(y) - f(x) &= \nabla f(x)^\top (y - x) + \frac{1}{2} (y - x)^\top \nabla^2f(x + t(y - x))(y - x) \ &\ge \nabla f(x)^\top (y - x) + \frac{\mu}{2}| y - x |^2 \end{aligned}

This already gives us the first order characterization.

$f(x + th) - f(x) \ge t \nabla f(x)^\top h + \frac{\mu t^2}{2} |h|^2$

Reuse the Taylor Expansion:

$f(x + th) - f(x) = t\nabla f(x)^\top h + \frac{t^2}{2}h^\top \nabla^2f(x)h + o(t^2)$

We now get

$h^\top \nabla^2f(x) h \ge - 2\frac{o(t^2)}{t^2} + \mu | h|^2, (t \to 0)$

This finishes the proof.

### Convex Optimization Problems

In Optimization Learning Notes 6, we already shows that for a typical convex minimization problem, any local minimum must be a global minimum. There are twos things remains here:

• If $f$ is strictly convex, then the minimization problem possesses at most one local minimizer which (if it exists) is also the strict global minimum of $f$ on $X$.

• If the optimization set $X$ is open and $f$ is continuously differentiable and $x^\ast$ is a stationary point of $f$. Then, $x^\ast$ is a global minimizer of $f$ on $X$. (we provides a proof here)

#### Proof of the first part

Suppose $f$ possesses two local minimizer $x^\ast \ne y ^\ast$. We then know that they are both global solutions:

$f(x^\ast) = f(y^\ast)$

Consider

$z^\ast = \frac{1}{2}x^\ast + \frac{1}{2}y^\ast$

We have

$f(z^\ast) \lt \frac{1}{2}f(x^\ast) + \frac{1}{2}f(y^\ast) = f(x^\ast)$

#### Proof of the second part

Just use the first order characterization:

$f(y) \ge f(x^\ast) + \nabla f(x^\ast)(y - x^\ast) = f(x^\ast), \forall y \in X$

Hence, $x^\ast$ is a global solution.

#### Convexity of $f(x) - \frac{\mu}{2}| x|^2$

Consider

$g(x) = f(x) - \frac{\mu}{2}| x|^2$

We show that $f(x)$ is strongly convex with parameter $\mu$ iff $g(x)$ is convex.

First suppose $f(x)$ is strongly convex, the we have

\begin{aligned} g(\lambda a + (1 - \lambda)b) &= f(\lambda a + (1 - \lambda)b) - \frac{\mu}{2} |\lambda a + (1 - \lambda) b|^2 \ &\le \lambda f(a) + (1 - \lambda )f(b) - \frac{\lambda (1 - \lambda)}{2} \mu |a - b |^2 - \frac{\mu}{2}\lambda^2| a |^2 - \frac{\mu}{2}(1 - \lambda)^2| b |^2 - \mu(1-\lambda)\lambda|a | |b|\ &= \lambda f(a) + (1 - \lambda )f(b ) - \frac{\mu}{2} \left[\lambda(1 - \lambda) (| a|^2 - 2| a|| b| + | b|^2) + \lambda^2| a |^2 + (1 - \lambda)^2| b |^2 + 2(1-\lambda)\lambda|a | |b|\right] \ &=\lambda f(a) + (1 - \lambda )f(b ) - \frac{\mu}{2} \left[\lambda(1 - \lambda) (| a|^2 + | b|^2) + \lambda^2| a |^2 + (1 - \lambda)^2| b |^2\right] \ &=\lambda f(a) + (1 - \lambda )f(b ) - \frac{\mu}{2} \left[\lambda | a|^2 + (1 - \lambda) | b|^2\right] \ &= \lambda g(a) + (1 - \lambda)g(b) \end{aligned}

Therefore, $g(x)$ is convex.

Conversely, if $g(x)$ is convex,

\begin{aligned} \lambda f(a) + (1 - \lambda) f(b)& = \lambda [g(a) + \frac{\mu}{2} | a|^2 ] + (1 - \lambda)[g(b) + \frac{\mu}{2}| b|^2] \ & = \lambda g(a) + (1 - \lambda) g(b) + \frac{\mu}{2} [\lambda |a |^2 + (1 - \lambda) | b|^2] \ & \ge g(\lambda a +(1 - \lambda)b) + \frac{\mu}{2} [\lambda |a |^2 + (1 - \lambda) | b|^2] \ & = f(\lambda a + (1 - \lambda)b) - \frac{\mu}{2} | \lambda a + (1 - \lambda)b|^2 + \frac{\mu}{2} [\lambda |a |^2 + (1 - \lambda) | b|^2] \end{aligned}

Then, by a similar approach in the above proof, we see

$-\frac{\mu}{2} | \lambda a + (1 - \lambda)b|^2 + \frac{\mu}{2} [\lambda |a |^2 + (1 - \lambda) | b|^2] = 0$

We finishes the proof.

#### Convexity of Epigraph

The epigraph of a function $f : \mathbb R^n \to \mathbb R$ is defined as

$\text{epi } f := {(x, t) : x \in \mathbb R^n, \mu \in \mathbb R, \mu \ge f(x)} \subseteq \mathbb R^{n + 1}$

$f$ is convex iff its epigraph is convex. Consider we have

$(x_1, t_1), (x_2, t_2) \in \text{epi } f$

Define

$(\tilde x, \tilde y) = \lambda (x_1, t_1) + (1 - \lambda)(x_2, t_2)$

We then have

\begin{aligned} \tilde y &= \lambda y_1 + (1 - \lambda)y_2 \ &\ge \lambda f(y_1) + (1 - \lambda)f(y_2) \ &\ge f(\lambda y_1 + (1 - \lambda)y_2) \ & = f(\tilde x) \end{aligned}

Therefore, $(\tilde x, \tilde y)$ is in the epigraph and the epigraph is convex.

To proof the other side, we know that

$\lambda (x_1, f(x_1)) + (1 - \lambda)(x_2, f(x_2))$

is in the epigraph. By the convexity of the epigraph,

$f(\lambda x_1 + (1 - \lambda)x_2) \le \lambda f(x_1) + (1 - \lambda) f(x_2)$

Therefore, the function also be convex.

#### Existence of Directional Derivative

Sometimes, the objective function is not differentiable. However, if it is convex, we can see that the directional derivative always exists.

Suppose $f : \mathbb R^n \to \mathbb R$ is convex. we then have

1. The difference quotient $\frac{f(x + td) - f(x)}{t}$ is a non-decreasing function of $t$ on $(0, +\infty)$.

2. For every $x, d \in \mathbb R^n$ the the directional derivative $f'(x; d)$ always exists and is given by $f'(x; d) = \inf_{t \gt 0} \frac{f(x + td) - f(x)}{t}$

##### Proof of 1

Let $x, d \in \mathbb R^n$ and let $0 < t_1 < t_2$. Then

\begin{aligned} f(x + t_1 d) &= f\left(x + \left(\frac{t_1}{t_2}\right)t_2d\right) \ &= f\left[\left(1 - \frac{t_1}{t_2}\right)x + \frac{t_1}{t_2} (x + t_2 d)\right] \ &\le \left(1 - \frac{t_1}{t_2}\right) f(x) + \frac{t_1}{t_2}f(x + t_2 d) \end{aligned}

Hence,

$\frac{f(x + t_1 d) - f(x)}{t_1} \le \frac{f(x + t_2 d) - f(x)}{t_2}$

##### Proof of 2

The definition of the directional derivative is

$f'(x; d) := \lim_{t \to 0} \frac{f(x + td) - f(x)}{t}$

Since

$\frac{f(x + td) - f(x)}{t}$

is non-decreasing.

Then the limit is necessarily given by the infimum of these ratios.

#### Local Lipschitz Continuity

Let $f: \mathbb R^n \to (-\infty, \infty]$ be a convex function such that the interior of its domain is not empty. Then, $f$ is locally Lipschitz continuous on in the interior of its domain.

$\exists L, | f(x) - f(y)| \le L |x - y|, \forall x, y \in K, K \subseteq \text{Int}(\text{Dom } f)$

[Proof for Local Lipschitz Continuity may be provided later]