Sandwich inequality of optimal gap and distance to solutions of LP

Suppose we have a general optimization problem.

\displaystyle \begin{aligned} & \underset{x \in \mathbb{R}^n}{\text{minimize}} & & f(x)\\ & {\text{subject to}} & & x \in \Omega. \\ \end{aligned} \ \ \ \ \ (1)

Also suppose problem (1) has a minimum and the minimum can be achieved by an unique minimizer {x^*}.

Now if I have a point {x} such that {f(x) - f(x^*) } is very small, then how small is the distance {\|x-x^*\|}. We might expect that {f(x) - f(x^*) \rightarrow 0} will imply that {x \rightarrow x^*}. This is true if {\Omega} is compact and {f} is continuous. But this does not tell what is the quantitative relationship between the optimal gap, i.e., {f(x) -f(x^*)} and the distance to the solution, i.e., {\|x-x^*\|}.

In this post, I am going to show that for linear programming (LP) problem, the optimal gap and distance to solutions are the same up to a multiplicative constant which only depend on the problem data.

To start, consider an LP in the standard form, i.e.,

\displaystyle \begin{aligned} & \underset{x \in \mathbb{R}^n}{\text{minimize}} & & c^Tx\\ & {\text{subject to}} & & Ax =b \\ & & & x\geq 0. \end{aligned} \ \ \ \ \ (2)

where the decision variable is {x} and problem data are {A\in \mathbb{R}^{m\times n},b\in \mathbb{R}^m,c\in \mathbb{}R^n}. {x\geq 0} means each coordinate of {x} is nonnegative.

Denote the solution set of problem (1) to be {X^* = \arg \min\{ c^Tx| Ax = b, x\geq 0\} }, and the distance to the solution set {X^*} to be {\text{dist}(x,X^*) = \inf_{x^* \in X^*} \|x-x^*\|}. Note that the norm here is arbitrary, not necessarily the Euclidean norm.

We are now ready to state the theorem.

Theorem 1 (Sandwich inequality of optimal gap and distance to solutions of LP) For problem (1), theres exist constants {C_0, c_0>0} which depends only on {A,b,c} and {\|\cdot\|} such that for all feasible {x}, i.e., {Ax= b, x\geq 0},

\displaystyle C_0 \text{dist}(x,X^*)\geq c^Tx - c^Tx^* \geq c_0 \text{dist}(x,X^*).

The above theorem shows that the role of optimal gap, i.e., {c^Tx - c^Tx^*}, and the distance to the solution set, i.e. {\text{dist}(x,X^*)}, are the same up to a multiplicative constant. The right inequality  of the theorem is usually referred as linear growth  in the optimization literature.

The proof below is constructive and we can in fact take {c_0 = \frac{\epsilon}{\max\{2B,1\}}} where { B = \max_{ 1\leq i \leq l} \|x_i\|} and {\epsilon = \min_{ m+1\leq i\leq l, n+1\leq j \leq s} \{ c^Tx_i-c^Tx^*, c^T\gamma_j\}} for {x^*\in X^*} and {C_0 = \|c\|_*}. Here {\| y\|_* = \sup \{ u^Ty\mid \|u\|\leq 1\}} is the dual norm and x_i,\gamma_j are extreme points and extreme rays. We assume there are l many extreme points and s many extreme rays with first m\; x_is and first n \; \gamma_j are in the optimal set X^*. See the proof for more detail.

The idea of the proof mainly relies on the extreme point and extreme rays representation of the feasible region and the optimal set,i.e., {\{x : Ax = b, x\geq 0\}} and X^*.

Proof: The feasible region can be written as

\displaystyle \{x|Ax=b,x\geq 0\} = \{ \sum_{i=1}^l \alpha_i x_i + \sum_{j=1}^s \beta_j \gamma_j | \sum_i \alpha_i = 1, \alpha_i\geq 0, \beta_j\geq 0\}.

Here {x_i}s are extreme points of the feasible region and {\gamma_j}s are extreme rays. By scaling the extreme rays, we can assume that {\|\gamma_j\| =1} for all {j}.

The optimal set can also be written as

\displaystyle X^* = \{ \sum_{i=1}^m \alpha_i x_i + \sum_{j=1}^n \beta_j \gamma_j | \sum_i \alpha_i = 1, \alpha_i\geq 0, \beta_j\geq 0\}.

We assume here the first {m} many {x_i} and {n} many {\gamma_j} are in the optimal set and the rest of {x_i} and {\gamma_j}s are not for notation simplicity.

We denote {B = \max_{ 1\leq i \leq l} \|x_i\|} and {\epsilon = \min_{ m+1\leq i\leq l, n+1\leq j \leq s} \{ c^Tx_i-c^Tx^*, c^T\gamma_j\}} where {x^*\in X^*}. Note {\epsilon>0} since the {\gamma_j}s not in the optimal set should have inner product with {c} to be positive.

We first prove the second inquality, i.e., {c^Tx - c^Tx^* \geq c_0 \text{dist}(x,X^*)}.

Now take an arbitrary feasible {x}, it can be written as

\displaystyle x = \sum_{i=1}^m a_i x_i + \sum_{i=m+1}^l a_i x_i + \sum _{j=1}^n b_j \gamma_j + \sum_{j=n+1}^s b_j \gamma_j

for some {a_i \geq 0, \sum_i a_i =1} and {b_j\geq 0}.

The objective value of {x} is then

\displaystyle c^Tx = \sum_{i=1}^m a_i c^T x_i + \sum_{i=m+1}^l a_i c^T x_i +\sum_{j=n+1}^s b_j c^T \gamma_j.

We use the fact that {c^T\gamma_j=0} for all {j\leq m} here.

Subtract the above by {c^Tx^*}. We have

\displaystyle \begin{aligned} c^Tx -c^Tx^*& = (\sum_{i=1}^m a_i-1)c^T x_i + \sum_{i=m+1}^l a_i c^T x_i +\sum_{j=n+1}^s b_j c^T \gamma_j \\ & = (-\sum_{i=m+1}^l a_i)c^T x_i + \sum_{i=m+1}^l a_i c^T x_i +\sum_{j=n+1}^s b_j c^T \gamma_j \\ &\geq (\sum_{i=m+1}^l a_i) \epsilon + (\sum_{j=n+1}^s b_j )\epsilon \end{aligned} \ \ \ \ \ (3)

The second equality is due to {\sum_i a_i =1} and the inequality is because of the definition of {\epsilon} and the {b_j,a_i}s are positive.

The distance between {x} and {X^*} is the infimum of

\displaystyle \| \sum_{i=1}^m( a_i-\alpha_i)x_i + \sum_{i=m+1}^l a_i x_i + \sum _{j=1}^n (b_j-\beta_j) \gamma_j + \sum_{j=n+1}^s b_j \gamma_j \|.

By taking {\alpha_i \geq a_i} with {\sum_i^m \alpha_i =1} and {\beta_j = b_j} and apply triangular inequality to the above quantity, we have

\displaystyle \begin{aligned} &\| \sum_{i=1}^m( a_i-\alpha_i)x_i + \sum_{i=m+1}^l a_i x_i + \sum _{j=1}^n (b_j-\beta_j) \gamma_j + \sum_{j=n+1}^s b_j \gamma_j \| \\ &\leq \sum_{i=1}^m( \alpha_i-a_i)\|x_i \|+ \sum_{i=m+1}^l a_i\|x_i \|+\sum_{j=n+1}^s b_j \|\gamma_j \|\\ &\leq \sum_{i=1}^m (\alpha_i -a_i) B + \sum_{i=m+1}^l a_i B + \sum_{j=n+1}^s b_j\\ &= (1-\sum_{i=1}^m a_i)B + (\sum_{i=m+1}^l a_i)B + \sum_{j=n+1}^s b_j\\ &= 2B (\sum_{i=m+1}^l a_i) +\sum_{j=n+1}^s b_j. \end{aligned} \ \ \ \ \ (4)

The first inequality is the triangular inequality and {\alpha_i\geq a_i, a_i\geq 0, b_j\geq 0}. The second inequality is applying the definition of {B} and {\|\gamma_j\|=1}. The first equality is due to {\sum_i \alpha_i =1} and the second equality is due to {\sum_i a_i =1}.

Thus the distance between {x} and {X^*} is bounded above by

\displaystyle \text{dist}(x,X^*) \leq 2B(\sum_{i=m+1}^l a_i) +\sum_{j=n+1}^s b_j.

Since {c^Tx -c^Tx^* \geq( \sum_{i = m+1}^l a_i + \sum_{j=n+1}^s b_j)\epsilon} by our previous argument, we see that setting

\displaystyle c_0 = \frac{\epsilon}{\max\{2B,1\}}

should give us

\displaystyle c^Tx - c^Tx^* \geq c_0 \text{dist}(x,X^*).

We now prove the inequality

\displaystyle C_0 \text{dist}(x,X^*)\geq c^Tx - c^Tx^*.

Note that the infimum in {\text{dist}(x,X^*) = \inf_{x^* \in X^*} \|x-x^*\|} is actually achieved by some {x^*}. The reason is that we can first pick a {x'\in X^*}, then

\displaystyle \inf_{x^* \in X^*} \|x-x^*\| = \inf_{x^* \in X^*, \|x-x^*\| \leq \|x-x'\|} \|x-x^*\|.

But the set {X^* \cap \{x^* | \|x-x^*\| \leq \|x-x'\| \}} is actually bounded and closed ({X^*} is closed as it is a convex combination of finite points plus a conic combination of extreme vectors), thus Weierstrass theorems tells us that the infimum is actually achieved by some {x^*\in X}.

Now take {x^*} such that {\|x-x^*\| =\text{dist}(x,X)}. We have

\displaystyle c^Tx -c^Tx^* \leq \|c\|_* \|x-x^*\|=\|c\|_*\text{dist}(x,X^*)

where {\|\cdot\|_*} is the dual norm of {\|\cdot\|}. Thus letting {C_0 = \|c\|_*} finishes the proof. \Box

From the proof, we see that two possible choice of {c_0} and {C_0} are {c_0 = \frac{\epsilon}{\max\{2B,1\}}} where { B = \max_{ 1\leq i \leq l} \|x_i\|} and {\epsilon = \min_{ m+1\leq i\leq l, n+1\leq j \leq s} \{ c^Tx_i-c^Tx^*, c^T\gamma_j\}} for {x^*\in X^*} and {C_0 = \|c\|_*}. These are not optimal and can be sharpened. I probably will give a sharper constant in a future post.