In this post, we present an interlacing property of order statistics.

Suppose we have a distribution on real line and where and means “independently and identically distributed as”. We denote to be the -th largest random variable of the many s. So is the largest value and is the smallest of the many iid s. Note here the subscript in does not mean taking to the th power but means how many s we are considering.

The ordered variables are called order statistics in Statistics. They are sufficient statistics for distribution when s are independent and all follow . We now present an interlacing property of order statistics.

Theorem 1For any , and , we have

The meaning of the right inequalities of the inequality (1) and the inequality (2) is that the larger the sample size , the smaller the -th largest value. The left inequalities of the inequality (1) and the inequality (2) show that the larger sample size , the larger the -th order statistic is in expectation.

The first inequality chain is the stochastic dominance version of the second inequality chain. The first chain is a stronger property than the second.

Let us now explain why we call Theorem 1 an interlacing property. Consider the case that and take , , and in Theorem 1. The inequality (2) gives that

If , then we can take , , and . Then equation (2) gives

Continuing the choices of and while fixing , we see that

Thus we see the quantities and are interlacing. The main idea of the proof is putting and and in the same probability space and arguing that we actually have almost sure inequality instead of just expectation or stochastically dominance. This technique is known as **coupling**.

*Proof:* Let us first prove the right inequality. Suppose we first generate many iid s following and order them so that we have s. Now if we generate (independently) many more follows as well with . We now consider all the we generated, then we find is at most since we have

and

Since we have almost sure inequality, we surely have stochastic dominance and inequality in expectation. Now consider the left inequality. We still follow the previous ways of generating s. After we have s, the -th largest value is always smaller than the -th largest value since

and

Thus we have shown that we have stochastically dominance for the random variables we are considering. The proof here is a bit lengthy and maybe not so clear but correct. One may draw a picture of the data-generating process and then compare the s to see the correctness.

Interestingly, the above proof actually did not use the assumption that s are iid. This means we can have the same conclusion for any joint distribution. We state this result explicitly below.

Theorem 2Let have joint distribution . Denote to be the order statistics. Now consider only the first many s, i.e., . Denote their marginal distribution as . We denote the order statistics of these many s to be . Then for any , and , we have

*Proof:* The proof is a repetition of the previous argument. Let us first prove the middle inequality of (8) and (9). Suppose we first generate many s following and order them so that we have s. Now if we generate the extra many more follows the conditional distribution given by the conditional defined by and the first many thetas. Specifically, the conditional distribution is

We now consider all the we generated. Note that the unconditional distribution of all the s are just . We find is at most since we have

and

Since we have almost sure inequality, we surely have stochastic dominance and inequality in expectation. Now consider the right-most inequality of (8) and (9). . We still follow the previous ways of generating s. After we have s, the th largest value is always smaller than the th largest value since

and

Thus we have shown that we have stochastically dominance for the random variables we are considering. The left-most inequality of of (8) and (9) can be proved using the fact that for any .

I found this interlacing result while taking a take-home exam on game theory. It turns out this result is not useful for the exam though. I would also like to thank Yueyang Liu for capturing a mistake in the previous version of the post.