May | 2018 | Three Squirrels

This post studies a specific chain rule of composition of convex functions. Specifically, we have the following theorem.

Theorem 1 For a continuously differentiable increasing function ${\phi: \mathbb{R} \rightarrow \mathbb{R}}$ , a convex function ${h: U \rightarrow \mathbb{R}}$ where ${U\in \mathbb{R}^n}$ is a convex set and an ${x\in U}$ , if ${\phi'(h(x))>0}$ or ${x\in \mbox{int}(U)}$ , then

$\displaystyle \begin{aligned} \partial (\phi \circ h) (x) = \phi' (h(x)) \cdot[ \partial h (x)], \end{aligned} \ \ \ \ \ (1)$

where ${\partial }$ is the operator of taking subdifferentials of a function, i.e., ${\partial h (x) = \{ g\mid h(x)\geq h(y) +\langle{g},{y-x}\rangle,\forall y\in U\}}$ for any ${x\in U}$ , and ${\mbox{int}(U)}$ is the interior of ${U}$ with respect to the standard topology in ${\mathbb{R}^n}$ .

A negative example. We note that if our condition fails, the equality may not hold. For example, let ${\phi(x) =1}$ for all ${x\in \mathbb{R}}$ and ${h(x) = 1 }$ defined on ${[0,1]}$ . Then ${0}$ is a point which is not in the interior of ${[0,1],\phi'(0) = 0}$ , ${\partial h(0) =(-\infty,0]}$ . However, in this case ${\partial (\phi\circ h)(0)= (-\infty,0]}$ and ${\phi' (h(0)) \cdot[ \partial h (0)] =0}$ . Thus, the equality fails.

It should be noted that if ${U}$ is open and ${h}$ is also differentiable, then the above reduces to the common chain rule of smooth functions.

Proof: We first prove that ${\partial (\phi \circ h) (x) \supset \phi' (h(x)) \cdot[ \partial h (x)]}$ . We have for all ${x\in U , g \in \partial h(x)}$ ,

$\displaystyle \begin{aligned} \phi (h(y)) &\overset{(a)}{\geq} \phi(h(x)) + \phi' (h(x))(h(y)-h(x))\\ & \overset{(b)}{\geq} \phi (h(x)) + \phi '(h(x)) \langle g,y-x\rangle \\ & = \phi (h(x)) + \langle{\phi' (h(x))g},{y-x}\rangle \end{aligned} \ \ \ \ \ (2)$

where ${(a),(b)}$ are just the definition of subdifferential of ${\phi}$ at ${h(x)}$ and ${h}$ at ${x}$ . We also use the fact that ${\phi(x)\geq 0}$ in the inequality ${(b)}$ as ${\phi}$ is increasing.

Now we prove the other direction. Without lost of generality, suppose ${0\in U}$ such that ${\partial (\phi \circ h)(0)}$ is not empty. Let ${g\in \partial (\phi \circ h)(0)}$ , we wish to show that ${g}$ is in the set ${ \phi' (h(0)) \cdot[ \partial h (0)]}$ . First according to the definition of subdifferential, we have

$\displaystyle \begin{aligned} (\phi \circ h) (x)\geq (\phi\circ h)(0) + \langle{ g},x\rangle, \forall x \in U \end{aligned} \ \ \ \ \ (3)$

This gives

$\displaystyle \begin{aligned} (\phi \circ h) (\gamma x)\geq (\phi\circ h)(0) + \langle{ g} ,{\gamma x}\rangle, \forall x \in U, \gamma \in [0,1]. \end{aligned} \ \ \ \ \ (4)$

Rearranging the above inequality gives

$\displaystyle \begin{aligned} &\frac{(\phi \circ h) (\gamma x)- (\phi\circ h)(0)}{\gamma}\geq \langle{ g},{ x}\rangle \\ \implies & \phi'(s)\cdot \frac{h(\gamma x)-h(0)}{\gamma}\geq \langle{g},{x}\rangle \end{aligned} \ \ \ \ \ (5)$

for some ${s}$ between ${h(\gamma x)}$ and ${h(0)}$ by mean value theorem. Now, by letting ${f(\gamma )=h(\gamma x)}$ , if ${f}$ is right continuous at ${0}$ , then by Lemma 1 in this post and ${\phi}$ is continuously differentiable, we know ${l(\gamma) =\frac{h(\gamma x)-h(0)}{\gamma}}$ is nondecreasing in ${\gamma}$ and we have

$\displaystyle \begin{aligned} \phi'(h(0)) (h(x)-h(0))\geq \phi'(h(0))\cdot \lim_{\gamma \downarrow 0} \frac{h(\gamma x)-h(0)}{\gamma}\geq \langle{g},{x}\rangle,\forall x\in U. \end{aligned} \ \ \ \ \ (6)$

If ${\phi' (h(0))>0}$ , then dividing both sides of the above inequality by ${\phi'(h(0))}$ gives

$\displaystyle \begin{aligned} h(x)-h(0)\geq \langle{\frac{g}{\phi'(h(0))}},{x}\rangle,\forall x\in U. \end{aligned} \ \ \ \ \ (7)$

This shows that ${\frac{g}{\phi'(h(0))}}$ is indeed a member of ${\partial h(x)}$ and thus ${\partial (\phi \circ h) (x) \subset \phi' (h(x)) \cdot[ \partial h (x)]}$ . In this case, we only need to verify why ${f(\gamma ) = h(\gamma x) }$ must be right continuous at ${0}$ .

If ${0\in \mbox{int}(U)}$ , then ${h}$ is definitely continuous at ${x}$ and so is ${f}$ by standard result in convex analysis. If ${\phi' (h(0))>0}$ , then we are done by inequality (7). If ${\phi' (h(0))=0}$ , using the inequality (6), we have

$\displaystyle \begin{aligned} 0 \geq \langle{g},{x}\rangle, \forall x\in U \end{aligned} \ \ \ \ \ (8)$

Since ${0 \in \mbox{int}(U)}$ , then ${x}$ can take a small positive and negative multiple of ${n}$ standard basis vectors in ${\mathbb{R}^n}$ in the inequality (8). This shows ${g =0}$ and it indeed belongs to the set ${\phi' (h(0)) \cdot[ \partial h (x)] = \{0\}}$ as ${\partial (h(0))\not=\emptyset}$ for ${0\in \mbox{int}(U)}$ by standard convex analysis result.

Thus our task now is to argue why ${f(\gamma ) = h(\gamma x)}$ is indeed right continuous at ${0}$ . Using Lemma 4 in this post, we know the limit ${\lim_{\gamma \downarrow 0 }f(\gamma) = f(0^+)}$ exists and ${f(0^+)\leq f(0)}$ . Now if ${f(0^+) = f(0) = h(0)}$ , then ${f}$ is indeed right continuous at ${0}$ and our conclusion holds. So we may assume ${f(0^+) <f(0) =h(0)}$ . But in this case ${l(\gamma) = \frac{h(\gamma x)-h(0)}{\gamma} = \frac{f(\gamma)-f(0)}{\gamma}}$ is going to be negative infinity as ${\gamma \downarrow 0}$ . Recall from inequality (5), we have

$\displaystyle \phi'(s)l(\gamma )\geq \langle{g},{x}\rangle$

where ${s}$ is between ${h(0)}$ and ${f(\gamma )=h(\gamma x)}$ . We claim that as ${\gamma\downarrow 0}$ , ${\phi'(s)}$ approaches a positive number. If this claim is true, then from the above inequality, we will have

$\displaystyle -\infty \geq \langle{g},{x}\rangle$

which cannot hold. Thus we must have ${f}$ right continuous at ${0}$ .

Finally, we prove our claim that ${\phi'(s)}$ is approaching a positive number if ${f(0^+) <f(0)}$ . Using mean value theorem, we have for some ${s_0 \in [ f(0^+), f(0)]}$

$\displaystyle \begin{aligned} \phi(s_0)(f(0^+)-f(0)) & = \phi(f(0^+)) -\phi(f(0))\\ & = \lim_{ \gamma \downarrow 0 } \phi(f(\gamma))-\phi(f(0))\\ & = \lim_{\gamma \downarrow 0}\phi(s)(f(\gamma)-f(0))\\ & = (f(0^+)-f(0))\lim_{\gamma\downarrow 0}\phi(s). \end{aligned} \ \ \ \ \ (9)$

Now cancel the term ${f(0^+) -f(0)<0}$ above, we see that ${\phi(s_0) = \lim_{\gamma\downarrow}\phi(s)}$ . We claim ${\phi(s_0)>0}$ . If ${\phi(s_0)=0}$ , then because ${\phi}$ is increasing, we have that ${\phi}$ is constant in ${[f(0^+),f(0)]}$ as ${\phi(f(0^+)) -\phi(f(0)) = \phi(s_0)(f(0^+)-f(0)) =0}$ . This contradicts our assumption that ${\phi'(f(0))>0}$ and our proof is complete. $\Box$

We discuss a few special properties of one-dimensional function convex functions. The first says something about the slope of one-dimensional convex functions.

Lemma 1 Let ${f:[0,1]\rightarrow \mathbb{R}}$ be a (strictly) convex function, then the function ${g(x) = \frac{f(x)-f(0)}{x}, x\in(0,1]}$ is non-decreasing (strictly increasing).

Proof: By the definition of convex function, we have for any ${0<s<t\leq 1}$

$\displaystyle f(s) = f(\frac{s}{t} \times t + 0 \times \frac{t-s}{t}) \leq \frac{s}{t}f(t) +\frac{t-s}{t}f(0).$

Rearrange the above inequality gives

$\displaystyle \frac{f(s) -f(0)}{s}\leq \frac{f(t)-f(0)}{t}.$

The case for strict convexity is simply replacing the above ${\leq}$ by ${<}$ . $\Box$

The second asserts that the slope should always increase.

Lemma 2 If ${f:I\rightarrow \mathbb{R}}$ where ${I}$ is any connected set in ${\mathbb{R}}$ ,i.e., an interval which can be half open half closed or open or closed. The for any ${x<y<z\in I}$ , we have

$\displaystyle \frac{f(x)-f(y)}{x-y}\leq \frac{f(y)-f(z)}{y-z}$

Proof: This is just a twist of algebra.

$\displaystyle \begin{aligned} & \frac{f(x)-f(y)}{x-y}\leq \frac{f(y)-f(z)}{y-z}\\ \iff& \frac{y-z}{x-y}f(x) +f(z)\geq (1+ \frac{y-z}{x-y})f(y)\\ \iff & \frac{y-z}{x-y}f(x) +f(z)\geq \frac{x-z}{x-y}f(y)\\ \iff & \frac{y-z}{x-z}f(x) +\frac{x-y}{x-z}f(z)\geq f(y)\\ \iff & \frac{z-y}{z-x}f(x) +\frac{y-x}{z-x}f(z)\geq f(\frac{z-y}{z-x}x +\frac{y-x}{z-x}z)\\ \end{aligned} \ \ \ \ \ (1)$

where the last one holds due to the definition of convexity. $\Box$

The next lemma shows that convex functions in a real line must always be increasing or always decreasing or first decreasing and then increasing.

Lemma 3 Let ${f:I \rightarrow \mathbb{R}}$ where ${I}$ is an interval in ${\mathbb{R}}$ . The interval ${I}$ can be any one kind of ${[a,b],(a,b),(a,b],[a,b)}$ where ${-\infty \leq a<b\leq \infty}$ . Then ${f}$ is either always increasing or always decreasing or first decreasing and then increasing.

Proof: We may suppose ${f}$ is not always a constant since in this case, the assertion is true. Now suppose that ${f}$ is not always increasing and is also not always decreasing. Then there exists ${x<y <z}$ all in ${I}$ such that

$\displaystyle f(x) > f(y) , f(y)< f(z)$

$\displaystyle f(x) <f(y), f(y) > f(z).$

The latter case is not possible because it implies that

$\displaystyle \frac{f(x)-f(y)}{x-y}> 0 > \frac{f(y) - f(z)}{y-z}$

which contradicts the convexity of ${f}$ . Thus only the first case is possible.

Now if ${x,z}$ are interior point of ${I}$ , Then ${f}$ is continuous on the interior of ${I}$ and so is continuous on ${[x,z]}$ . This means that ${f}$ attains a minimum on ${[x,z]}$ at some ${x_0\in (x,z)}$ as ${f(y)<\min \{f(x),f(z)\}}$ . Now for any ${x_1<x_2<x<x_3<x_4<x_0}$ , by convexity and monotonicity of slope, we have

$\displaystyle \begin{aligned} \frac{f(x_1)-f(x_2)}{x_1-x_2}\leq \frac{f(x_2)-f(x)}{x_2-x_3}\leq \frac{f(x)-f(x_3)}{x-x_3}\leq \frac{f(x_3)-f(x_4)}{x_3-x_4}\leq \frac{f(x_4)-f(x_0)}{x_4-x_0}\leq 0. \end{aligned} \ \ \ \ \ (2)$

which implies ${f(x_1)\geq f(x_2)\geq f(x)\geq f(x_3)\geq f(x_4)}$ . Thus ${f}$ is decreasing on ${I\cap (-\infty,x_0]}$ . Similarly, using monotonicity of slope, we have ${f}$ is increasing on ${I\cap(x_0,\infty)}$ . Now if suppose either ${x}$ or ${z}$ is the end point of ${I}$ . We take the half points ${a_1 = \frac{x+y}{2},b_1 = \frac{y+z}{2}}$ . Then there are only three possible cases

${f(x)\geq f(a_1)\geq f(y)\geq f(b_1)\leq f(z)}$
${f(x)\geq f(a_1)\geq f(y)\leq f(b_1)\leq f(z)}$
${f(x)\geq f(a_1)\leq f(y)\leq f(b_1)\leq f(z)}$

The second case is ideal as ${a_1}$ and ${b_1}$ are interior point and we can proceed our previous argument and show ${f}$ is first decreasing and the then increasing. In the first case, we can take ${b_2 = \frac{b_1+z}{2}}$ and ask the relation between ${f(y),f(b_1),f(b_2)}$ and ${f(z)}$ . The only non-ideal case is that ${f(b_1)\geq f(b_2)\leq f(z)}$ (the other case gives three interior point such that ${f(y)\geq f(b_1)\leq f(b_2)}$ and we can employ our previous argument). We can then further have ${b_3 = \frac{b_2+z}{2}}$ . Again there will be only one non-ideal case that ${f(b_1)\geq f(b_2)\geq f(b_3)}$ .

Continuing this process, we either stop at some point to obtain a pattern ${f(y)\geq f(b_1)\leq f(b_i)}$ or the sequence ${b_n}$ satisfies ${f(b_i)\geq f(b_{i+1})}$ for all ${i}$ and ${\lim_{i\rightarrow \infty} b_i=z}$ . If ${z}$ is an interior point, then continuity of ${f}$ implies that ${\lim_{i\rightarrow \infty}f(b_i) = f(z)}$ . But this is not possible because ${f(z)> f(y)\geq f(b_i)}$ . Thus ${z}$ must be the end point of ${I}$ . Moreover, ${f}$ is actually decreasing on ${I/\{z\}}$ . Indeed, for any ${x_1>x_2, x_i\in I/\{z\}}$ for ${i=1,2}$ , there is a ${b_i}$ such that ${x_2>b_i}$ and so

$\displaystyle \frac{f(x_1)-f(x_2)}{x_1-x_2}\leq \frac{f(x_2)-f(b_i)}{x_2-b_i}\leq \frac{f(b_i)-f(b_{i+1})}{b_i-b_{i+1}}\leq 0.$

But since ${f(z)>f(y)}$ , we see ${f}$ is indeed decreasing on ${I/\{z\}}$ and increase/jump at the point ${z}$ .

The third case is similar, we will either have ${f}$ always increasing and only jump down at the end point ${x}$ or ${f}$ is just first decreases for some interval and then increases for some interval. $\Box$

Utilizing the above lemma, we can say 1) one-dimensional convex function is almost everywhere differentiable and 2) something about the boundary points of ${f}$ on a finite interval.

Lemma 4 Suppose ${f:[a,b] \rightarrow \mathbb{R}}$ is convex for some ${a\in \mathbb{R},b \in \mathbb{R}}$ . Then ${f(a) \geq \lim _{x\downarrow a}f(a) = f(a^+), f(b)\geq \lim_{x\uparrow b}f(x) =f(b^+)}$ .

The above lemma shows that a convex function on a finite open interval is uniformly continuous as it can be extended to the boundary. However, this property does not hold in higher dimension by considering ${f(x,y) = \frac{x^2}{y}}$ on ${x^2 \leq y, 0<y <1}$ and the point ${(0,0)}$ .

Three Squirrels

Little Discoveries of Optimization, Statistics, and Probability

Month: May 2018

Chain rule of convex function

Special Properties of 1-D convex function