Chain rule of convex function

This post studies a specific chain rule of composition of convex functions. Specifically, we have the following theorem.

Theorem 1 For a continuously differentiable increasing function {\phi: \mathbb{R} \rightarrow \mathbb{R}}, a convex function {h: U \rightarrow \mathbb{R}} where {U\in \mathbb{R}^n} is a convex set and an {x\in U}, if {\phi'(h(x))>0} or {x\in \mbox{int}(U)}, then

\displaystyle \begin{aligned} \partial (\phi \circ h) (x) = \phi' (h(x)) \cdot[ \partial h (x)], \end{aligned} \ \ \ \ \ (1)

where {\partial } is the operator of taking subdifferentials of a function, i.e., {\partial h (x) = \{ g\mid h(x)\geq h(y) +\langle{g},{y-x}\rangle,\forall y\in U\}} for any {x\in U}, and {\mbox{int}(U)} is the interior of {U} with respect to the standard topology in {\mathbb{R}^n}.

A negative example. We note that if our condition fails, the inequality may not hold. For example, let {\phi(x) =1} for all {x\in \mathbb{R}} and {h(x) = 1 } defined on {[0,1]}. Then {0} is a point which is not in the interior of {[0,1],\phi'(0) = 0}, {\partial h(0) =(-\infty,0]}. However, in this case {\partial (\phi\circ h)(0)= (-\infty,0]} and {\phi' (h(0)) \cdot[ \partial h (0)] =0}. Thus, the equality fails.

It should be noted that if {U} is open and {h} is also differentiable, then the above reduces to the common chain rule of smooth functions.

Proof: We first prove that {\partial (\phi \circ h) (x) \supset \phi' (h(x)) \cdot[ \partial h (x)]}. We have for all {x\in U , g \in \partial h(x)},

\displaystyle \begin{aligned} \phi (h(y)) &\overset{(a)}{\geq} \phi(h(x)) + \phi' (h(x))(h(y)-h(x))\\ & \overset{(b)}{\geq} \phi (h(x)) + \phi '(h(x)) \langle g,y-x\rangle \\ & = \phi (h(x)) + \langle{\phi' (h(x))g},{y-x}\rangle \end{aligned} \ \ \ \ \ (2)

where {(a),(b)} are just the definition of subdifferential of {\phi} at {h(x)} and {h} at {x}. We also use the fact that {\phi(x)\geq 0} in the inequality {(b)} as {\phi} is increasing.

Now we prove the other direction. Without lost of generality, suppose {0\in U} such that {\partial (\phi \circ h)(0)} is not empty. Let {g\in \partial (\phi \circ h)(0)}, we wish to show that {g} is in the set { \phi' (h(0)) \cdot[ \partial h (0)]}. First according to the definition of subdifferential, we have

\displaystyle \begin{aligned} (\phi \circ h) (x)\geq (\phi\circ h)(0) + \langle{ g},x\rangle, \forall x \in U \end{aligned} \ \ \ \ \ (3)

This gives

\displaystyle \begin{aligned} (\phi \circ h) (\gamma x)\geq (\phi\circ h)(0) + \langle{ g} ,{\gamma x}\rangle, \forall x \in U, \gamma \in [0,1]. \end{aligned} \ \ \ \ \ (4)

Rearranging the above inequality gives

\displaystyle \begin{aligned} &\frac{(\phi \circ h) (\gamma x)- (\phi\circ h)(0)}{\gamma}\geq \langle{ g},{ x}\rangle \\ \implies & \phi'(s)\cdot \frac{h(\gamma x)-h(0)}{\gamma}\geq \langle{g},{x}\rangle \end{aligned} \ \ \ \ \ (5)

for some {s} between {h(\gamma x)} and {h(0)} by mean value theorem. Now, by letting {f(\gamma )=h(\gamma x)}, if {f} is right continuous at {0}, then by Lemma 1 in this post and {\phi} is continuously differentiable, we know {l(\gamma) =\frac{h(\gamma x)-h(0)}{\gamma}} is nondecreasing in {\gamma} and we have

\displaystyle \begin{aligned} \phi'(h(0)) (h(x)-h(0))\geq \phi'(h(0))\cdot \lim_{\gamma \downarrow 0} \frac{h(\gamma x)-h(0)}{\gamma}\geq \langle{g},{x}\rangle,\forall x\in U. \end{aligned} \ \ \ \ \ (6)

If {\phi' (h(0))>0}, then dividing both sides of the above inequality by {\phi'(h(0))} gives

\displaystyle \begin{aligned} h(x)-h(0)\geq \langle{\frac{g}{\phi'(h(0))}},{x}\rangle,\forall x\in U. \end{aligned} \ \ \ \ \ (7) 

This shows that {\frac{g}{\phi'(h(0))}} is indeed a member of {\partial h(x)} and thus {\partial (\phi \circ h) (x) \subset \phi' (h(x)) \cdot[ \partial h (x)]}. In this case, we only need to verify why {f(\gamma ) = h(\gamma x) } must be right continuous at {0}.

If {0\in \mbox{int}(U)}, then {h} is definitely continuous at {x} and so is {f} by standard result in convex analysis. If {\phi' (h(0))>0}, then we are done by inequality (7). If {\phi' (h(0))=0}, using the inequality (6), we have

\displaystyle \begin{aligned} 0 \geq \langle{g},{x}\rangle, \forall x\in U \end{aligned} \ \ \ \ \ (8)

Since {0 \in \mbox{int}(U)}, then {x} can take a small positive and negative multiple of {n} standard basis vectors in {\mathbb{R}^n} in the inequality (8). This shows {g =0} and it indeed belongs to the set {\phi' (h(0)) \cdot[ \partial h (x)] = \{0\}} as {\partial (h(0))\not=\emptyset} for {0\in \mbox{int}(U)} by standard convex analysis result.

Thus our task now is to argue why {f(\gamma ) = h(\gamma x)} is indeed right continuous at {0}. Using Lemma 4 in this post, we know the limit {\lim_{\gamma \downarrow 0 }f(\gamma) = f(0^+)} exists and {f(0^+)\leq f(0)}. Now if {f(0^+) = f(0) = h(0)}, then {f} is indeed right continuous at {0} and our conclusion holds. So we may assume {f(0^+) <f(0) =h(0)}. But in this case {l(\gamma) = \frac{h(\gamma x)-h(0)}{\gamma} = \frac{f(\gamma)-f(0)}{\gamma}} is going to be negative infinity as {\gamma \downarrow 0}. Recall from inequality (5), we have

\displaystyle \phi'(s)l(\gamma )\geq \langle{g},{x}\rangle

where {s} is between {h(0)} and {f(\gamma )=h(\gamma x)}. We claim that as {\gamma\downarrow 0}, {\phi'(s)} approaches a positive number. If this claim is true, then from the above inequality, we will have

\displaystyle -\infty \geq \langle{g},{x}\rangle

which cannot hold. Thus we must have {f} right continuous at {0}.

Finally, we prove our claim that {\phi'(s)} is approaching a positive number if {f(0^+) <f(0)}. Using mean value theorem, we have for some {s_0 \in [ f(0^+), f(0)]}

\displaystyle \begin{aligned} \phi(s_0)(f(0^+)-f(0)) & = \phi(f(0^+)) -\phi(f(0))\\ & = \lim_{ \gamma \downarrow 0 } \phi(f(\gamma))-\phi(f(0))\\ & = \lim_{\gamma \downarrow 0}\phi(s)(f(\gamma)-f(0))\\ & = (f(0^+)-f(0))\lim_{\gamma\downarrow 0}\phi(s). \end{aligned} \ \ \ \ \ (9)

Now cancel the term {f(0^+) -f(0)<0} above, we see that {\phi(s_0) = \lim_{\gamma\downarrow}\phi(s)}. We claim {\phi(s_0)>0}. If {\phi(s_0)=0}, then because {\phi} is increasing, we have that {\phi} is constant in {[f(0^+),f(0)]} as {\phi(f(0^+)) -\phi(f(0)) = \phi(s_0)(f(0^+)-f(0)) =0}. This contradicts our assumption that {\phi'(f(0))>0} and our proof is complete. \Box

Special Properties of 1-D convex function

We discuss a few special properties of one dimensional function convex function. The first asserts about the slope of one dimensional convex function.

Lemma 1 Let {f:[0,1]\rightarrow \mathbb{R}} be a (strict) convex function, then the function {g(x) = \frac{f(x)-f(0)}{x}, x\in(0,1]} is non-decreasing (strictly increasing).

Proof: By the definition of convex function, we have for any {0<s<t\leq 1}

\displaystyle f(s) = f(\frac{s}{t} \times t + 0 \times \frac{t-s}{t}) \leq \frac{s}{t}f(t) +\frac{t-s}{t}f(0).

Rearrange the above inequality gives

\displaystyle \frac{f(s) -f(0)}{s}\leq \frac{f(t)-f(0)}{t}.

The case for strict convexity is simply replacing the above {\leq} by {<}. \Box

The second asserts that slope should always increases.

Lemma 2 If {f:I\rightarrow \mathbb{R}} where {I} is any connected set in {\mathbb{R}},i.e., an interval which can be half open half closed or open or closed. The for any {x<y<z\in I}, we have

\displaystyle \frac{f(x)-f(y)}{x-y}\leq \frac{f(y)-f(z)}{y-z}

Proof: This is just a twist of algebra.

\displaystyle \begin{aligned} & \frac{f(x)-f(y)}{x-y}\leq \frac{f(y)-f(z)}{y-z}\\ \iff& \frac{y-z}{x-y}f(x) +f(z)\geq (1+ \frac{y-z}{x-y})f(y)\\ \iff & \frac{y-z}{x-y}f(x) +f(z)\geq \frac{x-z}{x-y}f(y)\\ \iff & \frac{y-z}{x-z}f(x) +\frac{x-y}{x-z}f(z)\geq f(y)\\ \iff & \frac{z-y}{z-x}f(x) +\frac{y-x}{z-x}f(z)\geq f(\frac{z-y}{z-x}x +\frac{y-x}{z-x}z)\\ \end{aligned} \ \ \ \ \ (1)

where the last one holds due to the definition of convexity. \Box

The next lemma shows that convex function in a real line must be always increasing or always decreasing or first decreasing and then increasing.

Lemma 3 Let {f:I \rightarrow \mathbb{R}} where {I} is an interval in {\mathbb{R}}. The interval {I} can be any one kind of {[a,b],(a,b),(a,b],[a,b)} where {-\infty \leq a<b\leq \infty}. Then {f} is either always increasing or always decreasing or first decreasing and then increasing.

Proof: We may suppose {f} is not always a constant since in this case, the assertion is true. Now suppose that {f} is not always increasing and is also not always decreasing. Then there exists {x<y <z} all in {I} such that

\displaystyle f(x) > f(y) , f(y)< f(z)

or

\displaystyle f(x) <f(y), f(y) > f(z).

The latter case is not possible because it implies that

\displaystyle \frac{f(x)-f(y)}{x-y}> 0 > \frac{f(y) - f(z)}{y-z}

which contradicts the convexity of {f}. Thus only the first case is possible.

Now if {x,z} are interior point of {I}, Then {f} is continuous on the interior of {I} and so is continuous on {[x,z]}. This means that {f} attains a minimum on {[x,z]} at some {x_0\in (x,z)} as {f(y)<\min \{f(x),f(z)\}}. Now for any {x_1<x_2<x<x_3<x_4<x_0}, by convexity and monotonicity of slope, we have

\displaystyle \begin{aligned} \frac{f(x_1)-f(x_2)}{x_1-x_2}\leq \frac{f(x_2)-f(x)}{x_2-x_3}\leq \frac{f(x)-f(x_3)}{x-x_3}\leq \frac{f(x_3)-f(x_4)}{x_3-x_4}\leq \frac{f(x_4)-f(x_0)}{x_4-x_0}\leq 0. \end{aligned} \ \ \ \ \ (2)

which implies {f(x_1)\geq f(x_2)\geq f(x)\geq f(x_3)\geq f(x_4)}. Thus {f} is decreasing on {I\cap (-\infty,x_0]}. Similarly, using monotonicity of slope, we have {f} is increasing on {I\cap(x_0,\infty)}. Now if suppose either {x} or {z} is the end point of {I}. We take the half points {a_1 = \frac{x+y}{2},b_1 = \frac{y+z}{2}}. Then there are only three possible cases

  1. {f(x)\geq f(a_1)\geq f(y)\geq f(b_1)\leq f(z)}
  2. {f(x)\geq f(a_1)\geq f(y)\leq f(b_1)\leq f(z)}
  3. {f(x)\geq f(a_1)\leq f(y)\leq f(b_1)\leq f(z)}

The second case is ideal as {a_1} and {b_1} are interior point and we can proceed our previous argument and show {f} is first decreasing and the then increasing. In the first case, we can take {b_2 = \frac{b_1+z}{2}} and ask the relation between {f(y),f(b_1),f(b_2)} and {f(z)}. The only non-ideal case is that {f(b_1)\geq f(b_2)\leq f(z)} (the other case gives three interior point such that {f(y)\geq f(b_1)\leq f(b_2)} and we can employ our previous argument). We can then further have {b_3 = \frac{b_2+z}{2}}. Again there will be only one non-ideal case that {f(b_1)\geq f(b_2)\geq f(b_3)}.

Continuing this process, we either stop at some point to obtain a pattern {f(y)\geq f(b_1)\leq f(b_i)} or the sequence {b_n} satisfies {f(b_i)\geq f(b_{i+1})} for all {i} and {\lim_{i\rightarrow \infty} b_i=z}. If {z} is an interior point, then continuity of {f} implies that {\lim_{i\rightarrow \infty}f(b_i) = f(z)}. But this is not possible because {f(z)> f(y)\geq f(b_i)}. Thus {z} must be the end point of {I}. Moreover, {f} is actually decreasing on {I/\{z\}}. Indeed, for any {x_1>x_2, x_i\in I/\{z\}} for {i=1,2}, there is a {b_i} such that {x_2>b_i} and so

\displaystyle \frac{f(x_1)-f(x_2)}{x_1-x_2}\leq \frac{f(x_2)-f(b_i)}{x_2-b_i}\leq \frac{f(b_i)-f(b_{i+1})}{b_i-b_{i+1}}\leq 0.

But since {f(z)>f(y)}, we see {f} is indeed decreasing on {I/\{z\}} and increase/jump at the point {z}.

The third case is similar, we will either have {f} always increasing and only jump down at the end point {x} or {f} is just first decreases for some interval and then increases for some interval. \Box

Utilizing the above lemma, we can say 1) one dimensional convex function is almost everywhere differentiable and 2) something about the boundary points of {f} on a finite interval.

Lemma 4 Suppose {f:[a,b] \rightarrow \mathbb{R}} is convex for some {a\in \mathbb{R},b \in \mathbb{R}}. Then {f(a) \geq \lim _{x\downarrow a}f(a) = f(a^+), f(b)\geq \lim_{x\uparrow b}f(x) =f(b^+)}.

The above lemma shows that a convex function on a finite open interval is uniformly continuous as it can be extend to the boundary. However, this property does not hold in higher dimension by considering {f(x,y) = \frac{x^2}{y}} on {x^2 \leq y, 0<y <1} and the point {(0,0)}.