Chain rule of convex function

This post studies a specific chain rule of composition of convex functions. Specifically, we have the following theorem.

Theorem 1 For a continuously differentiable increasing function {\phi: \mathbb{R} \rightarrow \mathbb{R}}, a convex function {h: U \rightarrow \mathbb{R}} where {U\in \mathbb{R}^n} is a convex set and an {x\in U}, if {\phi'(h(x))>0} or {x\in \mbox{int}(U)}, then

\displaystyle \begin{aligned} \partial (\phi \circ h) (x) = \phi' (h(x)) \cdot[ \partial h (x)], \end{aligned} \ \ \ \ \ (1)

where {\partial } is the operator of taking subdifferentials of a function, i.e., {\partial h (x) = \{ g\mid h(x)\geq h(y) +\langle{g},{y-x}\rangle,\forall y\in U\}} for any {x\in U}, and {\mbox{int}(U)} is the interior of {U} with respect to the standard topology in {\mathbb{R}^n}.

A negative example. We note that if our condition fails, the equality may not hold. For example, let {\phi(x) =1} for all {x\in \mathbb{R}} and {h(x) = 1 } defined on {[0,1]}. Then {0} is a point which is not in the interior of {[0,1],\phi'(0) = 0}, {\partial h(0) =(-\infty,0]}. However, in this case {\partial (\phi\circ h)(0)= (-\infty,0]} and {\phi' (h(0)) \cdot[ \partial h (0)] =0}. Thus, the equality fails.

It should be noted that if {U} is open and {h} is also differentiable, then the above reduces to the common chain rule of smooth functions.

Proof: We first prove that {\partial (\phi \circ h) (x) \supset \phi' (h(x)) \cdot[ \partial h (x)]}. We have for all {x\in U , g \in \partial h(x)},

\displaystyle \begin{aligned} \phi (h(y)) &\overset{(a)}{\geq} \phi(h(x)) + \phi' (h(x))(h(y)-h(x))\\ & \overset{(b)}{\geq} \phi (h(x)) + \phi '(h(x)) \langle g,y-x\rangle \\ & = \phi (h(x)) + \langle{\phi' (h(x))g},{y-x}\rangle \end{aligned} \ \ \ \ \ (2)

where {(a),(b)} are just the definition of subdifferential of {\phi} at {h(x)} and {h} at {x}. We also use the fact that {\phi(x)\geq 0} in the inequality {(b)} as {\phi} is increasing.

Now we prove the other direction. Without lost of generality, suppose {0\in U} such that {\partial (\phi \circ h)(0)} is not empty. Let {g\in \partial (\phi \circ h)(0)}, we wish to show that {g} is in the set { \phi' (h(0)) \cdot[ \partial h (0)]}. First according to the definition of subdifferential, we have

\displaystyle \begin{aligned} (\phi \circ h) (x)\geq (\phi\circ h)(0) + \langle{ g},x\rangle, \forall x \in U \end{aligned} \ \ \ \ \ (3)

This gives

\displaystyle \begin{aligned} (\phi \circ h) (\gamma x)\geq (\phi\circ h)(0) + \langle{ g} ,{\gamma x}\rangle, \forall x \in U, \gamma \in [0,1]. \end{aligned} \ \ \ \ \ (4)

Rearranging the above inequality gives

\displaystyle \begin{aligned} &\frac{(\phi \circ h) (\gamma x)- (\phi\circ h)(0)}{\gamma}\geq \langle{ g},{ x}\rangle \\ \implies & \phi'(s)\cdot \frac{h(\gamma x)-h(0)}{\gamma}\geq \langle{g},{x}\rangle \end{aligned} \ \ \ \ \ (5)

for some {s} between {h(\gamma x)} and {h(0)} by mean value theorem. Now, by letting {f(\gamma )=h(\gamma x)}, if {f} is right continuous at {0}, then by Lemma 1 in this post and {\phi} is continuously differentiable, we know {l(\gamma) =\frac{h(\gamma x)-h(0)}{\gamma}} is nondecreasing in {\gamma} and we have

\displaystyle \begin{aligned} \phi'(h(0)) (h(x)-h(0))\geq \phi'(h(0))\cdot \lim_{\gamma \downarrow 0} \frac{h(\gamma x)-h(0)}{\gamma}\geq \langle{g},{x}\rangle,\forall x\in U. \end{aligned} \ \ \ \ \ (6)

If {\phi' (h(0))>0}, then dividing both sides of the above inequality by {\phi'(h(0))} gives

\displaystyle \begin{aligned} h(x)-h(0)\geq \langle{\frac{g}{\phi'(h(0))}},{x}\rangle,\forall x\in U. \end{aligned} \ \ \ \ \ (7) 

This shows that {\frac{g}{\phi'(h(0))}} is indeed a member of {\partial h(x)} and thus {\partial (\phi \circ h) (x) \subset \phi' (h(x)) \cdot[ \partial h (x)]}. In this case, we only need to verify why {f(\gamma ) = h(\gamma x) } must be right continuous at {0}.

If {0\in \mbox{int}(U)}, then {h} is definitely continuous at {x} and so is {f} by standard result in convex analysis. If {\phi' (h(0))>0}, then we are done by inequality (7). If {\phi' (h(0))=0}, using the inequality (6), we have

\displaystyle \begin{aligned} 0 \geq \langle{g},{x}\rangle, \forall x\in U \end{aligned} \ \ \ \ \ (8)

Since {0 \in \mbox{int}(U)}, then {x} can take a small positive and negative multiple of {n} standard basis vectors in {\mathbb{R}^n} in the inequality (8). This shows {g =0} and it indeed belongs to the set {\phi' (h(0)) \cdot[ \partial h (x)] = \{0\}} as {\partial (h(0))\not=\emptyset} for {0\in \mbox{int}(U)} by standard convex analysis result.

Thus our task now is to argue why {f(\gamma ) = h(\gamma x)} is indeed right continuous at {0}. Using Lemma 4 in this post, we know the limit {\lim_{\gamma \downarrow 0 }f(\gamma) = f(0^+)} exists and {f(0^+)\leq f(0)}. Now if {f(0^+) = f(0) = h(0)}, then {f} is indeed right continuous at {0} and our conclusion holds. So we may assume {f(0^+) <f(0) =h(0)}. But in this case {l(\gamma) = \frac{h(\gamma x)-h(0)}{\gamma} = \frac{f(\gamma)-f(0)}{\gamma}} is going to be negative infinity as {\gamma \downarrow 0}. Recall from inequality (5), we have

\displaystyle \phi'(s)l(\gamma )\geq \langle{g},{x}\rangle

where {s} is between {h(0)} and {f(\gamma )=h(\gamma x)}. We claim that as {\gamma\downarrow 0}, {\phi'(s)} approaches a positive number. If this claim is true, then from the above inequality, we will have

\displaystyle -\infty \geq \langle{g},{x}\rangle

which cannot hold. Thus we must have {f} right continuous at {0}.

Finally, we prove our claim that {\phi'(s)} is approaching a positive number if {f(0^+) <f(0)}. Using mean value theorem, we have for some {s_0 \in [ f(0^+), f(0)]}

\displaystyle \begin{aligned} \phi(s_0)(f(0^+)-f(0)) & = \phi(f(0^+)) -\phi(f(0))\\ & = \lim_{ \gamma \downarrow 0 } \phi(f(\gamma))-\phi(f(0))\\ & = \lim_{\gamma \downarrow 0}\phi(s)(f(\gamma)-f(0))\\ & = (f(0^+)-f(0))\lim_{\gamma\downarrow 0}\phi(s). \end{aligned} \ \ \ \ \ (9)

Now cancel the term {f(0^+) -f(0)<0} above, we see that {\phi(s_0) = \lim_{\gamma\downarrow}\phi(s)}. We claim {\phi(s_0)>0}. If {\phi(s_0)=0}, then because {\phi} is increasing, we have that {\phi} is constant in {[f(0^+),f(0)]} as {\phi(f(0^+)) -\phi(f(0)) = \phi(s_0)(f(0^+)-f(0)) =0}. This contradicts our assumption that {\phi'(f(0))>0} and our proof is complete. \Box

Special Properties of 1-D convex function

We discuss a few special properties of one-dimensional function convex functions. The first says something about the slope of one-dimensional convex functions.

Lemma 1 Let {f:[0,1]\rightarrow \mathbb{R}} be a (strictly) convex function, then the function {g(x) = \frac{f(x)-f(0)}{x}, x\in(0,1]} is non-decreasing (strictly increasing).

Proof: By the definition of convex function, we have for any {0<s<t\leq 1}

\displaystyle f(s) = f(\frac{s}{t} \times t + 0 \times \frac{t-s}{t}) \leq \frac{s}{t}f(t) +\frac{t-s}{t}f(0).

Rearrange the above inequality gives

\displaystyle \frac{f(s) -f(0)}{s}\leq \frac{f(t)-f(0)}{t}.

The case for strict convexity is simply replacing the above {\leq} by {<}. \Box

The second asserts that the slope should always increase.

Lemma 2 If {f:I\rightarrow \mathbb{R}} where {I} is any connected set in {\mathbb{R}},i.e., an interval which can be half open half closed or open or closed. The for any {x<y<z\in I}, we have

\displaystyle \frac{f(x)-f(y)}{x-y}\leq \frac{f(y)-f(z)}{y-z}

Proof: This is just a twist of algebra.

\displaystyle \begin{aligned} & \frac{f(x)-f(y)}{x-y}\leq \frac{f(y)-f(z)}{y-z}\\ \iff& \frac{y-z}{x-y}f(x) +f(z)\geq (1+ \frac{y-z}{x-y})f(y)\\ \iff & \frac{y-z}{x-y}f(x) +f(z)\geq \frac{x-z}{x-y}f(y)\\ \iff & \frac{y-z}{x-z}f(x) +\frac{x-y}{x-z}f(z)\geq f(y)\\ \iff & \frac{z-y}{z-x}f(x) +\frac{y-x}{z-x}f(z)\geq f(\frac{z-y}{z-x}x +\frac{y-x}{z-x}z)\\ \end{aligned} \ \ \ \ \ (1)

where the last one holds due to the definition of convexity. \Box

The next lemma shows that convex functions in a real line must always be increasing or always decreasing or first decreasing and then increasing.

Lemma 3 Let {f:I \rightarrow \mathbb{R}} where {I} is an interval in {\mathbb{R}}. The interval {I} can be any one kind of {[a,b],(a,b),(a,b],[a,b)} where {-\infty \leq a<b\leq \infty}. Then {f} is either always increasing or always decreasing or first decreasing and then increasing.

Proof: We may suppose {f} is not always a constant since in this case, the assertion is true. Now suppose that {f} is not always increasing and is also not always decreasing. Then there exists {x<y <z} all in {I} such that

\displaystyle f(x) > f(y) , f(y)< f(z)

or

\displaystyle f(x) <f(y), f(y) > f(z).

The latter case is not possible because it implies that

\displaystyle \frac{f(x)-f(y)}{x-y}> 0 > \frac{f(y) - f(z)}{y-z}

which contradicts the convexity of {f}. Thus only the first case is possible.

Now if {x,z} are interior point of {I}, Then {f} is continuous on the interior of {I} and so is continuous on {[x,z]}. This means that {f} attains a minimum on {[x,z]} at some {x_0\in (x,z)} as {f(y)<\min \{f(x),f(z)\}}. Now for any {x_1<x_2<x<x_3<x_4<x_0}, by convexity and monotonicity of slope, we have

\displaystyle \begin{aligned} \frac{f(x_1)-f(x_2)}{x_1-x_2}\leq \frac{f(x_2)-f(x)}{x_2-x_3}\leq \frac{f(x)-f(x_3)}{x-x_3}\leq \frac{f(x_3)-f(x_4)}{x_3-x_4}\leq \frac{f(x_4)-f(x_0)}{x_4-x_0}\leq 0. \end{aligned} \ \ \ \ \ (2)

which implies {f(x_1)\geq f(x_2)\geq f(x)\geq f(x_3)\geq f(x_4)}. Thus {f} is decreasing on {I\cap (-\infty,x_0]}. Similarly, using monotonicity of slope, we have {f} is increasing on {I\cap(x_0,\infty)}. Now if suppose either {x} or {z} is the end point of {I}. We take the half points {a_1 = \frac{x+y}{2},b_1 = \frac{y+z}{2}}. Then there are only three possible cases

  1. {f(x)\geq f(a_1)\geq f(y)\geq f(b_1)\leq f(z)}
  2. {f(x)\geq f(a_1)\geq f(y)\leq f(b_1)\leq f(z)}
  3. {f(x)\geq f(a_1)\leq f(y)\leq f(b_1)\leq f(z)}

The second case is ideal as {a_1} and {b_1} are interior point and we can proceed our previous argument and show {f} is first decreasing and the then increasing. In the first case, we can take {b_2 = \frac{b_1+z}{2}} and ask the relation between {f(y),f(b_1),f(b_2)} and {f(z)}. The only non-ideal case is that {f(b_1)\geq f(b_2)\leq f(z)} (the other case gives three interior point such that {f(y)\geq f(b_1)\leq f(b_2)} and we can employ our previous argument). We can then further have {b_3 = \frac{b_2+z}{2}}. Again there will be only one non-ideal case that {f(b_1)\geq f(b_2)\geq f(b_3)}.

Continuing this process, we either stop at some point to obtain a pattern {f(y)\geq f(b_1)\leq f(b_i)} or the sequence {b_n} satisfies {f(b_i)\geq f(b_{i+1})} for all {i} and {\lim_{i\rightarrow \infty} b_i=z}. If {z} is an interior point, then continuity of {f} implies that {\lim_{i\rightarrow \infty}f(b_i) = f(z)}. But this is not possible because {f(z)> f(y)\geq f(b_i)}. Thus {z} must be the end point of {I}. Moreover, {f} is actually decreasing on {I/\{z\}}. Indeed, for any {x_1>x_2, x_i\in I/\{z\}} for {i=1,2}, there is a {b_i} such that {x_2>b_i} and so

\displaystyle \frac{f(x_1)-f(x_2)}{x_1-x_2}\leq \frac{f(x_2)-f(b_i)}{x_2-b_i}\leq \frac{f(b_i)-f(b_{i+1})}{b_i-b_{i+1}}\leq 0.

But since {f(z)>f(y)}, we see {f} is indeed decreasing on {I/\{z\}} and increase/jump at the point {z}.

The third case is similar, we will either have {f} always increasing and only jump down at the end point {x} or {f} is just first decreases for some interval and then increases for some interval. \Box

Utilizing the above lemma, we can say 1) one-dimensional convex function is almost everywhere differentiable and 2) something about the boundary points of {f} on a finite interval.

Lemma 4 Suppose {f:[a,b] \rightarrow \mathbb{R}} is convex for some {a\in \mathbb{R},b \in \mathbb{R}}. Then {f(a) \geq \lim _{x\downarrow a}f(a) = f(a^+), f(b)\geq \lim_{x\uparrow b}f(x) =f(b^+)}.

The above lemma shows that a convex function on a finite open interval is uniformly continuous as it can be extended to the boundary. However, this property does not hold in higher dimension by considering {f(x,y) = \frac{x^2}{y}} on {x^2 \leq y, 0<y <1} and the point {(0,0)}.