From this note, we will start talk about non-linear programs. To get a first glance of NLPs, let us first check out how we can issue the Interior Point Method.
Interior Point Method
Interior Point Method takes care of the primal and the dual problem at the same time. Recall that we have the following constraints for LPs:
- Primal Feasibility:
- Dual Feasibility:
In the simplex method, we search along basic feasible solutions; which means we always maintain the primal feasibility. Since we are checking among the combinations of basis, at each iteration, we can simply define:
such that the reduced costs are represented by:
Therefore, for each non-basic variable, the corresponding is zero; while for basic variable, the reduced costs are zero; this naturally yields the Complementarity Conditions.
While Complementarity Conditions hold during the whole process, there are cases when the dual feasibility is violated. However, notice that when we finally reach a stop condition, we will have the reduced costs greater or equal than zero:
which means the dual feasibility is then achieved.
Proposition: Optimality Properties of the Simplex Method
During each iteration, the simplex method maintains primal feasibility and the complementarity conditions. It seeks a solution that is dual feasible.
It then strikes on us that what if we maintain the other two conditions during the process? In fact, this will result in the dual-simplex method and the interior point method.
- It maintains dual feasibility.
- It maintains the complementarity conditions.
- However, primal feasibility does not need to be satisfied during the iterations. It seeks for primal feasible BFS.
- The tableau can be seen as a rotated variant of the primal one.
There are cases where using the dual simplex method can be more convenient:
- If a dual BFS is available (but we don’t have a primal BFS).
- We have mentioned a scenario in the discussion of the sensitivity analysis (when is changed by a large amount or a constraint is added)
The Interior Point Method
The interior point method maintains both primal feasibility and dual feasibility during its iterations and seeks for a pair of solutions that satisfy the complementarity conditions.
Looking at all constraints, a optimal pair satisfies:
There is a set of nonlinear equations here, which makes it to solve. As we have mentioned, IPM do not maintain complementarity conditions during the process, this means we can relax the problem to
Looking at all constraints, a optimal pair satisfies:
We can the complementarity gap.
So the IPM goes like the following: If we have found a solution for a certain , then it might be possible to find a solution for a smaller . Then we keep decreasing until we can find a solution of the LP. The essential step in the interior point method is to show that this is indeed doable – the approach is similar to Newton’s method.
Again, we need to resolve the issue of finding an initial basic solution in the interior point method. This can be resolved by solving an auxiliary problem (called the homogeneous self-dual problem).
Overall, the complexity of IPM is
Theorem: Quality of Solutions
The interior point method will always find the optimal solution with the maximum possible number of non-zeros.
- If we want a high-rank solution (with the maximum possible non-zeros), then choose the interior point method.
- If we want a low-rank solution (with small number of non-zeros), then choose the simplex method.
- The optimal solution output of the simplex method may depend on the initial solution (as well as on the pivoting rules)
In Optimization Learning Notes 1, we have talked about the definition of optimizers (local/global minimizer/maximizer). So in non-linear programming, we are still going discuss about how we can find these optimizers for our problems.
In order to do so, we need to first introduce some math tools.
- Gradient and First-Order Taylor Expansion: Suppose is continuous differentiable. Then we denote the gradient of by:
And the First-Order Taylor Expansion yields:
- Hessian and Second-Order Taylor Expansion: If is twice continuously differentiable, then the Hessian of is given by:
And the Second-Order Taylor Expansion yields:
By Schwarz's theorem, we know that for twice continuously differentiable functions has the following property:
So the Hessian matrix is symmetric.
Optimality Conditions for Unconstrained Problems
W.L.O.G, we focus our discussion on minimizers.
To obtain a local minimizer, it is necessary to have:
Otherwise, by First-Order Taylor Expansion,
We can find such that and when is small enough, we can then make
First-Order Necessary Conditions (FONC)
If is a local minimizer of the unconstrained problem ,then we must have .
Example: Least Square Problem
In least square problem, we want to use a linear combination of to approximate a target value such that
Given observations of (which are denoted by ), we now want to find a suitable such that it solves that following problem:
In the matrix form, we have:
So, the FONC for the problem is
FONC is not sufficient, so we need to add more constraints for make sure a point is our local minimizer. Another important condition is SONC.
By Taylor Expansion, we already have:
Suppose FONC holds, then,
If is a local minimizer, then we have,
In other words, is positive semidefinite.
Second-Order Necessary Conditions (SONC)
If is a local minimizer of , then it holds that:
- For all , .
We call a (symmetric) matrix positive (negative) semidefinite (PSD/NSD) if and only if for all we have
- we only talk about PSD for symmetric matrices.
- For non-symmetric ones, we use to define PSD. (We always have )
- A symmetric matrix is PSD, it all of its eigenvalues are nonnegative
- A symmetric matrix is PSD, it all of its leading principles are nonnegative.
- For any matrix , is always symmetric and PSD.
Definition: Stationary Points and Saddle Points
- A point satisfying is called critical point or stationary point.
- A stationary point is called saddle point if it is neither a local minimizer nor a local maximizer.
Corollary: Saddle Points
Suppose that is a stationary point () and that the Hessian is indefinite, then is a saddle point.
SONC is still not sufficient, but it can be modified into a sufficient condition
Second-Order Sufficient Conditions (SOSC)
Let be twice continuously differentiable. If satisfies:
- For all , .
Then is a strict local minimizer.
We call a (symmetric) matrix positive (negative) definite (PD/ND) if and only if for all we have
PD implies PSD and correspondingly, PD also has similar properties as PSD, but in a strict version.
Remark: the above conditions can be transformed into maximization conditions if we change PSD to NSD and PD to ND.
To proof SOSC, we first need to proof a lemma:
Lemma: Bounds and Eigenvalues
Let be a symmetric matrix. Then where are the smallest and the largest one among all eigenvalues of .
Proof of the Lemma
Remember the eigenvalue-decomposition (EVD) of the symmetric matrix is written as: where are orthonormal eigenvectors of , and is a diagonal matrix with these eigenvalues.
Then, we can proof the lemma. , we have It is easy to see: Since is orthonormal, Hence, we get
Proof of SOSC
By Taylor Expansion, With and when is small enough.
By the definition by little notation, Therefore, for all close enough to , we have Moreover, since SOSC requires the Hessian to be PD, then , which yields for sufficiently small. This indicates that is truly a local minimum.
Existence of Solutions
When we talked about Linear Programs, we never consider the cases where feasible region of the problem is bounded but not finite optimal. Actually, there is a theorem protecting us from being bothered by such situations:
Let be a continuous function and let be a bounded, closed and nonempty set. Then attains a global maximum and global minimum on ,
Here, we also clarify some definitions:
The set is closed if for every convergent sequence with for all and , it holds that .
Closeness is a topological property, that is if you have a continuous injective function mapping a closed set to another set , then must be also closed.
is bounded if there is such that .
A closed and bounded set is also called compact.
For unconstrained problems, however, we cannot directly use Weierstra Theorem because the feasible region may not be bounded. Fortunately, there is another tool, Coercivity that can help us in unconstrained cases.
A continuous function is said to be coercive if
Geometrically, coercivity means that increases as moves away from the origin in any possible direction.
Mathematically, coercivity means
It turns out that coercivity guarantees the existence of solutions:
Theorem: Coercivity and Existence of Solutions
Let be a continuous and coercive function. Then for all , the level set
is compact and has at least one global minimizer.
We already know is continuous, is closed (closeness preserves under continuous mapping).
Assume that the level set is not bounded. Then,
Since is coercive, we know
However, this is a contradiction to . So the level set of a coercive function is compact.
Optimality Conditions for Constrained Problems
The key difference between Unconstrained Problems and Constrained Problems is that the possible moving directions can be limited for the points of the latter one. This brings more difficulties for finding the solution.
Given , we call vector a feasible direction at if there exists such that for all .
Let be a local minimum of . Then for any feasible direction at , we must have .
Let be continuous differentiable. Then is called a descent direction at if and only if .
If we denote the set of feasible directions at by and the set of descent directions at by , then the first order necessary condition can be written as
General Optimality Conditions
From now on, let us consider the general form of the nonlinear programs:
The feasible set is .
Definition: Active and Inactive Set
At a point , the set denotes the set of active constraints. The set of inactive constraints is given by .
Let us start our journey with the linearly constrained problems.
For the model:
The FONC can be expressed as
If is a local minimum of , then
Now we show this actually guarantees that .
This is equivalent to
Since inactive indices provide no limit to the moving direction, we get
Rewrite the condition as
This is a linear program and the dual is
The primal problem is bounded and feasible, so according to the strong duality, the dual is also feasible.
Consequently, we can find
This is exactly:
The last constraints enforces
So if the optimization model changes to:
Then there are no inactive constraints anymore, so the FONC changes to
General Inequality Constraints
For general cases, consider the following nonlinear program:
Assumption: , are continuous differentiable.
When reaches a local minimum at , we have the following lemma:
Lemma: No Feasible Descent for Inequality Constraints
Let be a local minimum of the above problem. Then, there does not exist a vector s.t.
Suppose we have found a such that
Then, by Taylor Expansion, we can find s.t.
and mean while,
So we can see that,
is even optimal, which is a contradiction.
Theorem: Fritz-John Conditions
Further extend the above idea, we then come across Fritz-John Conditions:
Let be a local minimum of the above problem. Then there exists , which are not all zeros, such that
We prove this theorem by the last lemma, we cannot find such that:
Consider the problem
the optimal solution must be non-negative.
Again, we define
Rewrite the problem with , we get
The dual problem is a feasibility problem:
Since strong duality holds, the problem is feasible.
So we can find ,
(rescale to obtain ; and guarantees that not all are zero.)
However, Fritz-John Conditions are not very useful. can happen and in that case, we need to check all possible combinations that makes the set of
And, without the constraints from , many of the combinations may not be minimizer.
KKT-Conditions for Inequality Constraints
KKT-Conditions use an assumption to overcome the major drawbacks of Fritz-John Conditions.
Let be a local minimum and suppose the set of
are linearly independent. Then,
Then there exists , which are not all zeros, such that
The proof is trivial. We can find required by Fritz-John Conditions,
are linearly independent, . Then we can divide all by and obtain the required lambdas for KKT Conditions.
KKT Conditions: The General Setting
Now, it is the high time for General Nonlinear Optimization Problem
To begin with, we associate each constraint with a Lagrangian multiplier: We define the Lagrangian of this problem by: Until now you may still feel confused about what can we do with the Lagrangian. However, if you look at the form carefully, you may have grabbed a similar feeling as we are deriving a dual form. Indeed, here we require These conditions are exactly called dual feasibility.
And we also have the complementarity conditions: Our main condition comes from the Lagrangian (We move all the constraints to the objective and interpret them as reward/penalty functions, so the following condition does make sense): To sum up, the formal definition of KKT conditions is:
Theorem: KKT Condition
If is a local minimizer and if a regularity condition holds, then there exist and μ such that:
Primal Feasibility (extended)
Again, we require the collection of gradients to be linearly independent (fully-ranked).
- This condition is a constraint qualification (CQ) and is called Linear Independence Constraint Qualification (LICQ).
- A feasible point satisfying the LICQ is called regular.
- There are more CQs: ACQ, GCQ, MFCQ, PLICQ, Slater’s condition. See wikipedia.
- A (feasible) point satisfying the KKT conditions is called a KKT point
- KKT points are candidates for local optimal solutions – just like stationary points.
Second-Order Optimality Conditions for Constrained Problems
KKT conditions are necessary first-order optimality conditions. KKT-points are potential candidates for local minimizer. We still need to investigate Second-Order Optimality Conditions to check whether a point is truly a local minimizer or not.
We assume that , ,and are twice continuous differentiable. Therefore, we get calculate the Hessian of our Lagrangian: We define the so-called critical cone: Then, the second-order necessary conditions take the following form:
Theorem: SONC for Constrained Problems
Let be a regular point and local min. Then, the KKT-conditions hold and there are unique multiplier and such that and we have
The uniqueness of and in the SONC follows from the LICQ. (This can be helpful in calculations). There do exist variants of SONC withouf LICQ, but normally they are hard to use and form is unfriendly.
Theorem: SOSC for Constrained Problems
Let be a KKT-point with multiplier and such that and suppose that the condition is satisfied. Then, is a strict local minimizer.
Unlike our SONC, SOSC does not require the uniqueness of multipliers here. If there are multiple multipliers then as long as there exists such multiplier that fulfills SOSC,
then, is already a strict local minimizer.
Suppose that is not a strict local minimizer. Then, there exists a sequence within , such that and Define and . Then, we have . Therefore, is bounded. Every bounded sequence has a convergent subsequence, so we can find with such that Obviously, . Then by Taylor Expansion, we get when , by we know Similarly, , we have Since , we then get As , With similar tricks, for those equality constraints: So, as , Since is a KKT point, we have With , we know which means By , By Second Order Taylor Expansion, Notice that . With , As , which leads to a contradiction.
Solving Nonlinear Programs: Strategy
- Check LICQ (if required).
- Derive KKT-conditions.
- Discuss different easy cases via the complementarity conditions (set multiplier or constraints to ) to find all KKT-points.
- Calculate and at KKT-points.
- Check second-order conditions
- Check if is coercive or if is bounded. If so, the problem has global solutions (which must be KKT-points)!
- If the LICQ holds, then and are always unique!
- Finding maximizer: apply all steps to .
Visualization and Interpretation of the KKT Conditions
At the KKT-point, we have This means is the linear combination of and . One can visualize the gradient using this property.
The Dual Perspective
The KKT-conditions imply that the LP is feasible.
The dual problem is By strong duality, the optimal value is .
Define, So the KKT-conditions is equivalent to The set is called linearized tangent set.
- Lecture Notes 11-14 for MAT3007 (Optimization) by Dr. Andre Milzarek. Institute for Data and Decision Analytics, The Chinese University of Hong Kong, Shenzhen.
- Introduction to Nonlinear Optimization. Theory, Algorithms, and Applications with MATLAB, MOS-SIAM Series Optimization (2014), by Beck.