Close property of expectation under convexity

Suppose we have a random vector {Z} and a convex set {C\in {\mathbb R} ^n} such that

\displaystyle \mathop{\mathbb P}(Z\in C) =1.

If you are doing things with convexity, then you may wonder whether

\displaystyle \mathop{\mathbb E}(Z) \in C.

This is certainly true if {Z} only takes finitely many value in {C} or {C} is closed. In the first case, you just verify the definition of convexity and the second case, you may use the strong law of large numbers. But if you draw a picture and think for a while, you might wonder whether these conditions are needed as it looks like no matter what value {Z} takes, it can not go out of {C} and the average should still belong to {C} as long as {C} is convex. In this post, we are going to show that it is indeed the case and we then have a theorem.

Theorem 1 For any convex set {C\subset {\mathbb R}^n}, and for any random vector {Z} such that

\displaystyle \mathop{\mathbb P}(Z\in C)=1,

its expectation is still in {C}, i.e,

\displaystyle \mathop{\mathbb E}(Z) \in C

as long as the mean exists.

Skip the following remark if you don’t know or not familiar with measure theory.

Remark 1 If you are a measure theoretic person, you might wonder whether {C} should be Borel measurable. The answer is no. The set {C} needs not to be Borel measurable. To make the point clear, suppose there is an underlying probability {(\Omega, \mathcal{F},\mathop{\mathbb P})} and {Z} is a random variable from this probability space to {(\mathbb{R}^n, \mathcal{B})} where {\mathcal{B}} is the borel sigma-algebra. Then we can either add the condition that the event {\{\omega \in \Omega \mid Z\in C \} = F\in \mathcal{F}} or {\mathop{\mathbb P}(F)=1} is understood as {F} is a measurable event with respect to the completed measure space {(\Omega, \bar{\mathcal{F}},\bar{\mathop{\mathbb P}})} and we overload the notation {\mathop{\mathbb P}} to mean {\bar{\mathop{\mathbb P}}}. The probability space {(\Omega, \mathcal{F},\mathop{\mathbb P})} is completed by the probability measure {\mathop{\mathbb P}}.

To have some preparation for the proof, recall the separating hyperplane theorem of convex set.

Theorem 2 (Separating Hyperplane theorem) Suppose {C} and {D} are convex sets in {{\mathbb R}^n} and {C\cap D = \emptyset}, then there exists a nonzero {a \in {\mathbb R}^{n}} such that

\displaystyle a^Tx \geq a^Ty

for all {x\in C,y\in D}.

Also recall the following little facts about convexity.

  • Any convex set in {{\mathbb R}} is always an interval.
  • Any affine space of {n-m} dimension in {{\mathbb R}^n} is of the form {\{x\in {\mathbb R}^{n}:Ax=b\}} for some {A\in {\mathbb R}^{m\times n}} and {b \in {\mathbb R}^m}.

We are now ready to prove Theorem 1.

Proof of Theorem 1: We may suppose that {C} has nonempty interior. Since if it is not, we can take the affine plane containing {C} with smallest dimension. Suppose {L =\mathop{\mathbb E} (Z)} is not in {C}, then by separating hyperplane theorem, there exists a nonzero a such that

\displaystyle L= a^T\mathop{\mathbb E}(Z) \geq a^Tx, \forall x \in C.

Since {Z\in C} almost surely, we should have

\displaystyle a^TZ \leq L

almost surely. Since {\mathop{\mathbb E}( a^T Z)= a^T\mathop{\mathbb E}( Z)}, we see that {a^T Z= L} with probability {1}. Since intersection of the hyperplane of {a^Tx = L} and {C} is still convex, we see that that {Z} only takes value in a convex set in a {n-1} dimensional affine space.

Repeat the above argument, we can decrease the dimension until {n=1}. After a proper translation and rotation, we can say that {Z} takes its value in an interval in {\mathbb{R}} and we want to argue that the mean of {Z} is always in the interval.

This is almost trivial. Suppose the interval is bounded. If the interval is closed, then since taking expectation preserves order, i.e., {X\geq Y \implies \mathop{\mathbb E} X\geq \mathop{\mathbb E} Y}, we should have its mean in the interval. If the interval is half open and half closed and if the means is not in the interval, then {\mathop{\mathbb E} Z} must be the open end of the interval since expectation preserves order, but this means that {Z} has full measure on the open end which contradicts the assumption that {Z} is in the interval with probability one. The case both open is handled in the same way. If the interval is unbounded one way, then the previous argument still works and if it is just {\mathbb{R}}, then for sure that {\mathop{\mathbb E} Z \in\mathbb{R}}. This completes the proof. \Box

Solution to (general) Truncated Moment Problem

We are going to solve the truncated moment problem in this post. The theorem we are going to establish is more general than the original problem itself. The following theorem is a bit abstract, you can skip to Corollary 2 to see what the truncated moment problem is and why it has a generalization in the form of Theorem 1.

Theorem 1 Suppose {X} is a random transformation from a probability space {(A,\mathcal{A},\mathop{\mathbb P})} to a measurable space {(B,\mathcal{B})} where each singleton set of B is in \mathcal{B}. Let each {f_i} be a real valued (Borel measurable) function with its domain to be {B}, {i=1,\dots,m}. Given

\displaystyle (\mathbb{E}f_i(X))_{i=1,\dots,m}

and they are all finite, there exists a random variable {Y\in B} such that {Y} takes no more than {m+1} values in {B}, and

\displaystyle (\mathbb{E}f_i(Y))_{i=1,\dots,m} = (\mathbb{E}f_i(X))_{i=1,\dots,m}.

(If you are not familiar with terms Borel measurable, measurable space and sigma-algebras \mathcal{A}, \mathcal{B},  then just ignore these. I put these term here just to make sure the that the theorem is rigorous enough.)

Let me parse the theorem for you. Essentially, the theorem is trying to say that given {m} many expectations, no matter what kind of source the randomness comes from, i.e., what {X} is, we can always find a finite valued random variable (which is {Y} in the theorem) that achieves the same expectation.

To have a concrete sense of what is going on, consider the following Corollary of Theorem 1. It is the original truncated moment problem.

Corollary 2 (Truncated Moment Problem) For any real valued random variable {X\in {\mathbb R}} with its first {m} moments all finite, i.e., for all {1\leq i\leq m}

\displaystyle \mathop{\mathbb E}|X|^i < \infty,

there exists a real valued discrete random variable {Y} which takes no more than {m+1} values in {{\mathbb R}} and its first {m} moments are the same as {X}, i.e.,

\displaystyle (\mathbb{E}Y,\mathbb{E}(Y^2),\dots, \mathbb{E}(Y^m) )=(\mathbb{E}X,\mathbb{E}(X^2),\dots, \mathbb{E}(X^m)).

This original truncated moment problem is asking that given the (uncentered) moments, can we always find a finite discrete random variable that matches all the moments. It should be clear that is a simple consequence of Theorem 1 by letting {B={\mathbb R}} and {f_i(x) = x^{i},, i=1,\dots,m}.

There is also a multivariate version of truncated moment problem which can also be regarded as a special case of Theorem 1.

Corollary 3 (Truncated Moment Problem, Multivariate Version) For any real random vector {X=(X_1,\dots,X_n)\in \mathbb{R}^n} and its all {k}th order moments are finite, i.e.,

\displaystyle \mathop{\mathbb E}(\Pi_{i=1}^n|X_{i}|^{\alpha_i}) <\infty

for any {{1\leq \sum \alpha_i\leq k}}. Each {\alpha_i} here is a nonnegative integer. The total number of moments in this case is {n+k \choose k}. Then there is a real random vector {Y \in \mathbb{R}^n} such that it takes no more than {{n+k \choose k}+1} values, and

\displaystyle (\mathop{\mathbb E}(\Pi_{i=1}^nX_{i}^{\alpha_i}))_{1\leq \sum \alpha_i\leq k} = (\mathop{\mathbb E}(\Pi_{i=1}^nY_{i}^{\alpha_i})) _{1\leq \sum \alpha_i\leq k}.

Though the form of Theorem 1 is quite general and looks scary, it is actually a simple consequence of the following lemma and the use of convex hull.

Lemma 4 For any convex set {C \in \mathbb{R}^k}, and any random variable {Z} which has finite mean and takes value only in {C} , i.e,

\displaystyle \mathop{\mathbb E}(Z) \in \mathbb{R}^k, \mathop{\mathbb P}(Z\in C) =1,

we have

\displaystyle \mathop{\mathbb E} (Z) \in C.

The above proposition is trivially true if {C} is closed or Z takes only finitely many value. But it is true that {C} is only assumed to be convex. We will show it in this post.

We are now ready to show Theorem 1.

Proof of Theorem 1: Consider the set

\displaystyle L = \{ (f_i(x))_{i=1,\dots,m}\mid x\in B \},

The convex hull of this set {L} is

\displaystyle \text{conv}(L) = \{ \sum_{j=1}^l \alpha _j a_j\mid \alpha_j \geq 0 ,\sum_{j=1}^l \alpha_j =1, a_j\in L, l \in {\mathbb N}\}.

Now take the random variable {Z=(f_i(X))_{i=1,\dots,m}} which takes value only in {L\subset \text{conv}(L)}, by Lemma 4 of convex set, we know that

\displaystyle \mathop{\mathbb E} Z \in \text{conv}(L).

Note that every element in {\text{conv}(L)} has a FINITE representation in terms of {a_j}s!

This means we can find {l\in {\mathbb N}}, {\alpha_j\geq 0, \sum_{j = 1}^l \alpha_j =1} and {a_j \in L, j=1,\dots,l} such that

\displaystyle \sum_{j=1}^l \alpha_ja_j = \mathop{\mathbb E} Z = (\mathop{\mathbb E} f_i(X))_{i=1,\dots,m}.

Since each {a_j = (f(x_j))_{i=1,\dots,m}} for some {x_j \in B}, we can simply take the distribution of {Y} to be

\displaystyle \mathop{\mathbb P}(Y = x_j) = \alpha_j, \forall i =1,\dots,l.

Finally, apply the theorem of Caratheodory to conclude that {l\leq m+1}. \Box