In this post, we present an interlacing property of order statistics.
Suppose we have a distribution on real line
and
where
and
means “independently and identically distributed as”. We denote
to be the
-th largest random variable of the
many
s. So
is the largest value and
is the smallest of the
many iid
s. Note here the subscript
in
does not mean taking to the
th power but means how many
s we are considering.
The ordered variables are called order statistics in Statistics. They are sufficient statistics for distribution
when
s are independent and all follow
. We now present an interlacing property of order statistics.
Theorem 1 For any
,
and
, we have
The meaning of the right inequalities of the inequality (1) and the inequality (2) is that the larger the sample size , the smaller the
-th largest value. The left inequalities of the inequality (1) and the inequality (2) show that the larger sample size
, the larger the
-th order statistic is in expectation.
The first inequality chain is the stochastic dominance version of the second inequality chain. The first chain is a stronger property than the second.
Let us now explain why we call Theorem 1 an interlacing property. Consider the case that and take
,
, and
in Theorem 1. The inequality (2) gives that
If , then we can take
,
, and
. Then equation (2) gives
Continuing the choices of and
while fixing
, we see that
Thus we see the quantities and
are interlacing. The main idea of the proof is putting
and
and
in the same probability space and arguing that we actually have almost sure inequality instead of just expectation or stochastically dominance. This technique is known as coupling.
Proof: Let us first prove the right inequality. Suppose we first generate many iid
s following
and order them so that we have
s. Now if we generate (independently)
many more
follows
as well with
. We now consider all the
we generated, then we find
is at most
since we have
and
Since we have almost sure inequality, we surely have stochastic dominance and inequality in expectation. Now consider the left inequality. We still follow the previous ways of generating s. After we have
s, the
-th largest value
is always smaller than the
-th largest value
since
and
Thus we have shown that we have stochastically dominance for the random variables we are considering. The proof here is a bit lengthy and maybe not so clear but correct. One may draw a picture of the data-generating process and then compare the s to see the correctness.
Interestingly, the above proof actually did not use the assumption that s are iid. This means we can have the same conclusion for any joint distribution. We state this result explicitly below.
Theorem 2 Let
have joint distribution
. Denote
to be the order statistics. Now consider only the first
many
s, i.e.,
. Denote their marginal distribution as
. We denote the order statistics of these
many
s to be
. Then for any
,
and
, we have
Proof: The proof is a repetition of the previous argument. Let us first prove the middle inequality of (8) and (9). Suppose we first generate many
s following
and order them so that we have
s. Now if we generate the extra
many more
follows the conditional distribution given by the conditional defined by
and the first
many thetas. Specifically, the conditional distribution is
We now consider all the we generated. Note that the unconditional distribution of all the
s are just
. We find
is at most
since we have
and
Since we have almost sure inequality, we surely have stochastic dominance and inequality in expectation. Now consider the right-most inequality of (8) and (9). . We still follow the previous ways of generating s. After we have
s, the
th largest value
is always smaller than the
th largest value
since
and
Thus we have shown that we have stochastically dominance for the random variables we are considering. The left-most inequality of of (8) and (9) can be proved using the fact that for any
.
I found this interlacing result while taking a take-home exam on game theory. It turns out this result is not useful for the exam though. I would also like to thank Yueyang Liu for capturing a mistake in the previous version of the post.