Chain rule of convex function

This post studies a specific chain rule of composition of convex functions. Specifically, we have the following theorem.

Theorem 1 For a continuously differentiable increasing function ${\phi: \mathbb{R} \rightarrow \mathbb{R}}$ , a convex function ${h: U \rightarrow \mathbb{R}}$ where ${U\in \mathbb{R}^n}$ is a convex set and an ${x\in U}$ , if ${\phi'(h(x))>0}$ or ${x\in \mbox{int}(U)}$ , then

$\displaystyle \begin{aligned} \partial (\phi \circ h) (x) = \phi' (h(x)) \cdot[ \partial h (x)], \end{aligned} \ \ \ \ \ (1)$

where ${\partial }$ is the operator of taking subdifferentials of a function, i.e., ${\partial h (x) = \{ g\mid h(x)\geq h(y) +\langle{g},{y-x}\rangle,\forall y\in U\}}$ for any ${x\in U}$ , and ${\mbox{int}(U)}$ is the interior of ${U}$ with respect to the standard topology in ${\mathbb{R}^n}$ .

A negative example. We note that if our condition fails, the equality may not hold. For example, let ${\phi(x) =1}$ for all ${x\in \mathbb{R}}$ and ${h(x) = 1 }$ defined on ${[0,1]}$ . Then ${0}$ is a point which is not in the interior of ${[0,1],\phi'(0) = 0}$ , ${\partial h(0) =(-\infty,0]}$ . However, in this case ${\partial (\phi\circ h)(0)= (-\infty,0]}$ and ${\phi' (h(0)) \cdot[ \partial h (0)] =0}$ . Thus, the equality fails.

It should be noted that if ${U}$ is open and ${h}$ is also differentiable, then the above reduces to the common chain rule of smooth functions.

Proof: We first prove that ${\partial (\phi \circ h) (x) \supset \phi' (h(x)) \cdot[ \partial h (x)]}$ . We have for all ${x\in U , g \in \partial h(x)}$ ,

$\displaystyle \begin{aligned} \phi (h(y)) &\overset{(a)}{\geq} \phi(h(x)) + \phi' (h(x))(h(y)-h(x))\\ & \overset{(b)}{\geq} \phi (h(x)) + \phi '(h(x)) \langle g,y-x\rangle \\ & = \phi (h(x)) + \langle{\phi' (h(x))g},{y-x}\rangle \end{aligned} \ \ \ \ \ (2)$

where ${(a),(b)}$ are just the definition of subdifferential of ${\phi}$ at ${h(x)}$ and ${h}$ at ${x}$ . We also use the fact that ${\phi(x)\geq 0}$ in the inequality ${(b)}$ as ${\phi}$ is increasing.

Now we prove the other direction. Without lost of generality, suppose ${0\in U}$ such that ${\partial (\phi \circ h)(0)}$ is not empty. Let ${g\in \partial (\phi \circ h)(0)}$ , we wish to show that ${g}$ is in the set ${ \phi' (h(0)) \cdot[ \partial h (0)]}$ . First according to the definition of subdifferential, we have

$\displaystyle \begin{aligned} (\phi \circ h) (x)\geq (\phi\circ h)(0) + \langle{ g},x\rangle, \forall x \in U \end{aligned} \ \ \ \ \ (3)$

This gives

$\displaystyle \begin{aligned} (\phi \circ h) (\gamma x)\geq (\phi\circ h)(0) + \langle{ g} ,{\gamma x}\rangle, \forall x \in U, \gamma \in [0,1]. \end{aligned} \ \ \ \ \ (4)$

Rearranging the above inequality gives

$\displaystyle \begin{aligned} &\frac{(\phi \circ h) (\gamma x)- (\phi\circ h)(0)}{\gamma}\geq \langle{ g},{ x}\rangle \\ \implies & \phi'(s)\cdot \frac{h(\gamma x)-h(0)}{\gamma}\geq \langle{g},{x}\rangle \end{aligned} \ \ \ \ \ (5)$

for some ${s}$ between ${h(\gamma x)}$ and ${h(0)}$ by mean value theorem. Now, by letting ${f(\gamma )=h(\gamma x)}$ , if ${f}$ is right continuous at ${0}$ , then by Lemma 1 in this post and ${\phi}$ is continuously differentiable, we know ${l(\gamma) =\frac{h(\gamma x)-h(0)}{\gamma}}$ is nondecreasing in ${\gamma}$ and we have

$\displaystyle \begin{aligned} \phi'(h(0)) (h(x)-h(0))\geq \phi'(h(0))\cdot \lim_{\gamma \downarrow 0} \frac{h(\gamma x)-h(0)}{\gamma}\geq \langle{g},{x}\rangle,\forall x\in U. \end{aligned} \ \ \ \ \ (6)$

If ${\phi' (h(0))>0}$ , then dividing both sides of the above inequality by ${\phi'(h(0))}$ gives

$\displaystyle \begin{aligned} h(x)-h(0)\geq \langle{\frac{g}{\phi'(h(0))}},{x}\rangle,\forall x\in U. \end{aligned} \ \ \ \ \ (7)$

This shows that ${\frac{g}{\phi'(h(0))}}$ is indeed a member of ${\partial h(x)}$ and thus ${\partial (\phi \circ h) (x) \subset \phi' (h(x)) \cdot[ \partial h (x)]}$ . In this case, we only need to verify why ${f(\gamma ) = h(\gamma x) }$ must be right continuous at ${0}$ .

If ${0\in \mbox{int}(U)}$ , then ${h}$ is definitely continuous at ${x}$ and so is ${f}$ by standard result in convex analysis. If ${\phi' (h(0))>0}$ , then we are done by inequality (7). If ${\phi' (h(0))=0}$ , using the inequality (6), we have

$\displaystyle \begin{aligned} 0 \geq \langle{g},{x}\rangle, \forall x\in U \end{aligned} \ \ \ \ \ (8)$

Since ${0 \in \mbox{int}(U)}$ , then ${x}$ can take a small positive and negative multiple of ${n}$ standard basis vectors in ${\mathbb{R}^n}$ in the inequality (8). This shows ${g =0}$ and it indeed belongs to the set ${\phi' (h(0)) \cdot[ \partial h (x)] = \{0\}}$ as ${\partial (h(0))\not=\emptyset}$ for ${0\in \mbox{int}(U)}$ by standard convex analysis result.

Thus our task now is to argue why ${f(\gamma ) = h(\gamma x)}$ is indeed right continuous at ${0}$ . Using Lemma 4 in this post, we know the limit ${\lim_{\gamma \downarrow 0 }f(\gamma) = f(0^+)}$ exists and ${f(0^+)\leq f(0)}$ . Now if ${f(0^+) = f(0) = h(0)}$ , then ${f}$ is indeed right continuous at ${0}$ and our conclusion holds. So we may assume ${f(0^+) <f(0) =h(0)}$ . But in this case ${l(\gamma) = \frac{h(\gamma x)-h(0)}{\gamma} = \frac{f(\gamma)-f(0)}{\gamma}}$ is going to be negative infinity as ${\gamma \downarrow 0}$ . Recall from inequality (5), we have

$\displaystyle \phi'(s)l(\gamma )\geq \langle{g},{x}\rangle$

where ${s}$ is between ${h(0)}$ and ${f(\gamma )=h(\gamma x)}$ . We claim that as ${\gamma\downarrow 0}$ , ${\phi'(s)}$ approaches a positive number. If this claim is true, then from the above inequality, we will have

$\displaystyle -\infty \geq \langle{g},{x}\rangle$

which cannot hold. Thus we must have ${f}$ right continuous at ${0}$ .

Finally, we prove our claim that ${\phi'(s)}$ is approaching a positive number if ${f(0^+) <f(0)}$ . Using mean value theorem, we have for some ${s_0 \in [ f(0^+), f(0)]}$

$\displaystyle \begin{aligned} \phi(s_0)(f(0^+)-f(0)) & = \phi(f(0^+)) -\phi(f(0))\\ & = \lim_{ \gamma \downarrow 0 } \phi(f(\gamma))-\phi(f(0))\\ & = \lim_{\gamma \downarrow 0}\phi(s)(f(\gamma)-f(0))\\ & = (f(0^+)-f(0))\lim_{\gamma\downarrow 0}\phi(s). \end{aligned} \ \ \ \ \ (9)$

Now cancel the term ${f(0^+) -f(0)<0}$ above, we see that ${\phi(s_0) = \lim_{\gamma\downarrow}\phi(s)}$ . We claim ${\phi(s_0)>0}$ . If ${\phi(s_0)=0}$ , then because ${\phi}$ is increasing, we have that ${\phi}$ is constant in ${[f(0^+),f(0)]}$ as ${\phi(f(0^+)) -\phi(f(0)) = \phi(s_0)(f(0^+)-f(0)) =0}$ . This contradicts our assumption that ${\phi'(f(0))>0}$ and our proof is complete. $\Box$