This note addresses the steps in the symmetrization technique in empirical process theory in probability. I will walk through each step and provide explanations. If you have some confusion and questions about this technique, like me, then this might be helpful to you! I encountered the argument perhaps 5 or 6 years ago, and I thought I knew what was going on. Until recently, I realized I missed something (especially the step below), when I tried to use it in my own research.
Setup. Consider a finite function class of size , where is a set. We are given a series of independent random varaibles , , which take value in . Let . A typical quantity appeared in empirical process theory is
And we aim to upper bound it using the so-called Rademacher complexity (to be defined later).
The symmetrization argument. The symmetrization argument uses (1) an independent copy of , denoted as , and (2) independent Rademacher random variables, with . Denote . Let us display the symmetrization argument:
The last quantity is called the Rademacher complexity of w.r.t. . In step and , we uses the fact that and has the same distribution. In step , we use the fact that and are independent so that we could take the expectation w.r.t. first (or consider as ). In step , we use the triangle inequality. Next, we explain the steps and .
Step . Here in the step , we use the fact that for each , we have
Taking the expectation over gives
After taking a superium over the left-hand side of the above over first and then taking an expectation over , we conclude the step .
Step . Step looks puzzling (to me) at first. A naive reasoning of is that has the same distribution as . I was not convinced by this argument since we are taking a superium over .
Denote “equal in distribution” as . The things we are (or I am) missing is that there are many s in . Hence, knowing that for each ,
is not enough. Let us first assume is fixed. What we really need is to ensure that for any fixed , the joint distributions of the following two are the same,
To argue (3), first, for each , we have
by noting that Hence, because of independence of , we know that
This establishes the equality (3) for each fixed . If uniformly, then we see (3) continues to hold by using the law of total probability: for any measurable set
Here in , we use the law of total probability by splitting the event according to the value take. In , we replace the value of by since this is the condition in the conditional probability. In , we use the fact that is independent of . In , we use (3) for fixed . Thus, the equality (3) holds for the case being random. This means for any measurable function ,
Step in the symmetrization argument now follows by taking an appropriate .
Remark. To deal with the case that is infinite, we may set
Then our previous argument continues to work.