March | 2018 | Three Squirrels

We consider global conditioning property of Linear Program (LP) here. Let’s first define the problem. The standard form of LP is

$\displaystyle \begin{aligned} & \underset{x \in \mathbb{R}^n}{\text{minimize}} & & c^Tx\\ & {\text{subject to}} & & Ax =b ,\\ & & & x\geq 0 \end{aligned} \ \ \ \ \ (1)$

where ${A \in \mathbb{R}^{n\times m}, c\in \mathbb{R}^n}$ and ${b \in\mathbb{R}^m}$ . The dual of the standard form is

$\displaystyle \begin{aligned} & \underset{y \in \mathbb{R}^m}{\text{maximize}} & & b^Ty\\ & {\text{subject to}} & & A^Ty\leq c. \\ \end{aligned} \ \ \ \ \ (2)$

Now suppose we add perturbation ${u\in \mathbb{R}^{n}}$ to Program (2) in the ${c}$ term, we have

$\displaystyle \begin{aligned} & \underset{y \in \mathbb{R}^n}{\text{maximize}} & & b^Ty\\ & {\text{subject to}} & & A^Ty\leq c+u.\\ \end{aligned} \ \ \ \ \ (3)$

Let ${v(u)}$ be the optimal value of Program (3) and ${y(u)}$ be an optimal solution to the same program. We would like to ask the following question:

$\displaystyle \begin{aligned} \textbf{Q}&: \text{Is}\; v(u)\; \text{Lipschitz continuous?} \;\text{Is} \;y(u)\;\text{Lipschitz}? \end{aligned} \ \ \ \ \ (4)$

Such questions are called the conditioning of Program (2). It asks under some perturbation, how does the solution and optimal value changes accordingly. In general, we hope that Program (1) and (2) to be stable in the sense that small perturbation in the data, i.e., ${A,b,c}$ , only induces small changes in the optimal value and solution. The reason is that in real application, there might be some error in the coding of ${b,c}$ or even ${A}$ . If Program (1) and (2) are not stable with respect to the perturbation, then possibly either more costs need to be paid to the encoding of the data or the LP is not worth solving. At first sight, it is actually not even clear that

Whether ${v(u)}$ will be always finite?
Whether ${y(u)}$ is actually unique? If ${y(u)}$ is not unique, then the Lipschitz property of ${y(u)}$ does not even make sense.

To address the first issue, we make our first assumption that

A1: Program (2) has a solution and the optimal value is finite.

Under A1, by strong duality of LP, we have that Program (1) also has a solution and the optimal value is the same as the optimal value of Program (2). This, in particular, implies that (1) is feasible. We now characterize the region where ${v(u)}$ is finite. Let the feasible region of Program (1) be denoted as

$\displaystyle \mathcal{F}_p :\,= \{x\mid Ax = b,x\geq 0\},$

the set of vertices of ${\mathcal{F}_p}$ be ${\mathcal{V}_p=\{x_i\}_{i=1}^d}$ and the set of extreme rays be ${\mathcal{R}_p=\{\gamma_j\}_{j=1}^l}$ . Using our assumption A1, we know

$\displaystyle \mathcal{F}_p \not = \emptyset.$

Now according to the Resolution Theorem of the primal polyhedron, we have

$\displaystyle \mathcal{F}_p = \{ x\mid x = \sum_{i=1}^d \alpha _ix_i + \sum_{j=1}^l\beta_j \gamma_j, x_i \in \mathcal{V}_p, \gamma _j \in \mathcal{R}_p,\sum_{i=1}^d \alpha_i =1, \alpha_i\geq 0, \beta_j \geq 0, \forall i,j\}.$

Using this representation, the fact that the primal of Program (3) is

$\displaystyle \begin{aligned} & \underset{x \in \mathbb{R}^n}{\text{minimize}} & & (c+u)^Tx\\ & {\text{subject to}} & & Ax =b \\ & & & x\geq 0, \end{aligned} \ \ \ \ \ (5)$

and strong duality continue to hold for (3) and (5) as (5) is feasible by A1, the value of ${v(u)}$ is

$\displaystyle v(u) = \min \{ c^Tx_i +u^Tx_i\mid x_ i \in \mathcal{V}_p\} + \min\{c^T\gamma_j +\beta_ju^T\gamma_j \mid \gamma_j \in \mathcal{R}_p,\beta_j\geq 0\}.$

Thus it is immediate that the region where ${v(u)}$ is finite is

$\displaystyle \begin{aligned} \mathcal{F}_u = \{ u\mid \gamma_j ^Tu\geq 0, \forall \gamma_j \in \mathcal{R}_p\}. \end{aligned} \ \ \ \ \ (6)$

In particular, if ${\mathcal{F}_p}$ is bounded, then we have ${v(u)}$ to be always finite.

To address the issue whether ${y(u)}$ is unique, we make the following nondegenerate assumption on vertices.

A2: All the basic feasible solution of (1) are non-degenerate, i.e., all vertices of the feasible region of Program (1) have exactly ${m}$ positive entries.

Under this condition, and the definition of basic feasible solution, we know if ${x}$ is a basic feasible solution, then ${A_{\cdot V(x)}\in \mathbb{R}^{m\times m}}$ where ${V(x)=\{i\mid x_i \text{ is nonzero}\}}$ has all column independent. Here ${A_{\cdot V(x)}}$ is the submatrix of ${A}$ having columns with indices in ${\mathcal{V}(x)}$ . By complementarity of LP (3) and (5), we thus know that if ${x}$ is a baisc feasible solution and it is a solution to Program (5), then

$\displaystyle y(u) = (A_{\cdot V(x)})^{-T}(c+u)$

is the unique solution to Program (3) where ${(A_{\cdot V(x)})^{-T} =((A_{\cdot V(x)})^{T})^{-1} }$ . What we have done is that under A1 and A2,

$\displaystyle v(u) = \min \{ c^Tx_i +u^Tx_i, x_ i \in \mathcal{V}_p\}, u \in \mathcal{F}_u$

and

$\displaystyle y(u) = (A_{\cdot V(x)})^{-T}(c+u)$

where ${x = \arg\min_{x_i}\{ c^Tx_i +u^Tx_i, x_ i \in \mathcal{V}_p\}}$ .

By defining

$\displaystyle \Omega_i = \{ u \mid c^Tx_i +u^Tx_i \leq c^Tx_j +u^Tx_j,\forall j\not =i \}, \forall\, 1\leq i\leq d$

which is a polyhedron, we may write the above compactly as

$\displaystyle v(u) = c^Tx_i +u^Tx_i, \quad y(u) = (A_{\cdot V(x_i)})^{-T}(c+u), u \in \Omega_i.$

Then it is easy to see that the Lipschitz condition of ${v(u)}$ is

$\displaystyle |v(u_1)-v(u_2)| \leq \max\{\|x_i\|_*,x_i \in \mathcal{V}_p\|\} \|u_1-u_2\|$

where ${\|\cdot\|_*}$ is the dual norm of ${\|\cdot\|}$ . We note ${\|\cdot \|}$ here is arbitrary.

Similarly, by using the theorem in this post, we see that ${y(u)}$ is also Lipschitz continuous with Lipschitz property

$\displaystyle \| y(u_1) - y(u_2)\|_\alpha \leq\max \{ \|(A_{\cdot V(x)})^{-T}\|_{\alpha,\beta }, x \in \mathcal{V}_p\}\|u_1-u_2\|_{\beta }$

where ${\|(A_{\cdot V(x)})^{-T}\|_{\alpha,\beta } = \max_{\|x\|_{\beta }=1}\|(A_{\cdot V(x)})^{-T}\|_\alpha}$ and ${\|\cdot\|_\alpha, \|\cdot\|_\beta}$ are arbitrary norms. We conclude the above discussion in the following theorem.

Theorem 1 Recall ${\mathcal{V}_p}$ is the set of extreme points of the feasible region of (1), ${\mathcal{R}_p}$ is the set of extreme points of the feasible region of (1) and ${\mathcal{F}_u = \{ u\mid \gamma_j ^Tu\geq 0, \forall \gamma_j \in \mathcal{R}_p\}}$ . Under assumption A1 and A2, we have for all ${x\in \mathcal{F}_u}$ ,

$\displaystyle |v(u_1)-v(u_2)| \leq \max\{\|x_i\|_*,x_i \in \mathcal{V}_p\|\} \|u_1-u_2\|$

and

$\displaystyle \| y(u_1) - y(u_2)\|_\alpha \leq\max \{ \|(A_{\cdot V(x)})^{-T}\|_{\alpha,\beta }, x \in \mathcal{V}_p\}\|u_1-u_2\|_\beta$

where ${\|\cdot\|,\|\cdot\|_{\alpha},\|\cdot\|_\beta }$ are arbitrary norms. Here ${A_{\cdot V(x)}}$ is the submatrix of ${A}$ having columns with indices in ${\mathcal{V}(x)=\{i\mid x_i \text{ is nonzero}\}}$ .

How might one bound the Lipschitz constants

$\displaystyle \max\{\|x_i\|_*,x_i \in \mathcal{V}_p\|\} \text{ and } \max \{ \|(A_{\cdot V(x)})^{-T}\|_{\alpha,\beta }, x \in \mathcal{V}_p\}?$

The former might be done by giving a bound of the diameter of the feasible region ${\mathcal{F}_p}$ of Program (1) and the later might be done through some restricted isometry property of ${A}$ .

Suppose we have a collection of sets ${\{\Omega_i \subset \mathbb{R}^{n}\}_{i=1}^d}$ and a set of ${d}$ linear functions ${\{f_i(x) = A_ix +c_i\mid x\in \Omega_i, i=1,\dots, d\}}$ where each ${\Omega_i}$ being closed polyhedrons in ${\mathbb{R}^{n}}$ , i.e., ${\Omega_i = \{x\mid a_{i_j}^Tx + b_{i_j} \leq 0,\forall j\leq k(i)\}}$ for some ${a_{i_j} \in \mathbb{R}^{n},b_{i_j}\in \mathbb{R},k(i)\in \mathbb{N}}$ , and ${A_i\in \mathbb{R}^{m\times n}}$ and ${c_i \in \mathbb{R}^{m}}$ . Moreover, for any ${i\not =j}$ , and ${x\in \Omega_i \cap \Omega_j}$ , we have ${f_i(x) = f_j(x)}$ . Then we can define function ${f}$ on ${\Omega =\cup_{i=1}^d \Omega_i}$ such that

$\displaystyle f(x) = f_i(x) \quad \text{if} \; x\in \Omega_i.$

We further assume $\Omega$ is convex. We now state a theorem concerning the Lipschitz constant of ${f}$ .

Theorem 1 Under the above setting, given arbitrary norms ${\|\cdot\|_\alpha,\|\cdot\|_\beta }$ , ${f}$ is ${\max_{i\leq d}\{\|A_i\|_{\alpha,\beta}\}}$ -Lipschitz continuous with respect to ${\|\cdot\|_\alpha}$ and ${\|\cdot \|_\beta }$ , i.e., for all ${x_1,x_2 \in \Omega}$ ,

$\displaystyle \begin{aligned} \| f(x_2) - f(x_1)\|_\alpha \leq \max_{i\leq d}\{\|A_i\|_{\alpha,\beta }\}\|x_2-x_1\|_\beta . \end{aligned} \ \ \ \ \ (1)$

Here ${\|A_i\|}$ is ${\|A_i\|_{\alpha,\beta } = \max_{\|x\|_\beta \leq 1}\|A_ix\|_{\alpha}}.$

Proof: By letting ${g(t) = x_1 + t(x_2 - x_1)}$ , ${L = \max_{i\leq d}\{\|A_i\|_{\alpha,\beta }\}\|x_2-x_1\|_\beta }$ and ${h= f\circ g}$ . we see the inequality (1) is equivalent to

$\displaystyle \begin{aligned} \|h(1)- h(0)\|_\alpha \leq L. \end{aligned} \ \ \ \ \ (2)$

Using the property that ${f}$ is well-defined on ${\Omega_i\cap \Omega_j}$ for any ${i,j}$ , we have that there exists ${0=t_0\leq \dots \leq t_l\leq \dots \leq t_K\leq t_{K+1}=1, l=1,\dots,K}$ and corresponding ${i_1,\dots ,i_K}$ such that

$\displaystyle \begin{aligned} h(t_l)= A_{i_l}x^{(l)}+c_{i_l} = A_{i_{l-1}}x^{(l)}+c_{i_{l-1}} \end{aligned} \ \ \ \ \ (3)$

for all ${1\leq l\leq K}$ where ${x^{(l)} = g(t_l)}$ . We also let ${x^{(0)} =g(t_0) =x_1}$ and ${x^{(K+1)}=g(t_{{K+1}}) =x_2}$ . These ${x_l}$ s have the property that

$\displaystyle \begin{aligned} \sum_{l=0}^{K}\|x_{i_{l+1}} -x_{i_l}\|_\beta = \sum_{l=0}^{K}(t_{l+1}-t_l)\|x_2 -x_1\|_\beta = \|x_2-x_1\|_\beta . \end{aligned} \ \ \ \ \ (4)$

Thus we can bound the term ${\|h(1)- h(0)\|_\alpha }$ by

$\displaystyle \begin{aligned} \|h(1) - h(0)\|_\alpha &\leq \sum_{l=0}^{K} \|h(t_{l+1}) -h(t_{l})\|_\alpha \\ & \leq \sum_{l=0}^{K} \|A_{i_{l+1}}x_{l+1} +c_{i_{l+1}} - A_{i_l}x_{l} -c_{i_l}\|_\alpha \\ &\overset{(i)}{\leq } \sum_{l=0}^{K-1} \|A_{i_{l}}x_{l+1} +c_{i_{l}} - A_{i_l}x_{l} -c_{i_l}\|_\alpha \\ & = \sum_{l=0}^{K} \|A_{i_{l}}x_{l+1} - A_{i_l}x_{l} \|_\alpha \\ &\overset{(ii)}{\leq}\sum_{l=0}^{K}\|A_{i_l}\|_{\alpha,\beta } \|x_{l+1}-x_{l}\|_\beta \\ & \leq \max_{i}\{\|A_i\|_{\alpha,\beta },i\leq d\}\sum_{l=0}^{K} \|x_{l+1}-x_{l}\|_\beta \\ & \overset{(iii)}{\leq}\max_{i}\{\|A_i\|_{\alpha,\beta },i\leq d\} \|x_2-x_1\|_\beta \\ & = L, \end{aligned} \ \ \ \ \ (5)$

where ${(i)}$ is due to inequality (3), ${(ii)}$ is due to the definition of operator norm and ${(iii)}$ is using equality (4). This proves inequality (1). $\Box$

Three Squirrels

Little Discoveries of Optimization, Statistics, and Probability

Month: March 2018

LP Conditioning

Lipschitz constant of piecewise linear (vector valued) functions