This note addresses the steps in the symmetrization technique in empirical process theory in probability. I will walk through each step and provide explanations. If you have some confusion and questions about this technique, like me, then this might be helpful to you! I encountered the argument perhaps 5 or 6 years ago, and I thought I knew what was going on. Until recently, I realized I missed something (especially the step below), when I tried to use it in my own research.

**Setup.** Consider a finite function class of size , where is a set. We are given a series of independent random varaibles , , which take value in . Let . A typical quantity appeared in empirical process theory is

And we aim to upper bound it using the so-called Rademacher complexity (to be defined later).

**The symmetrization argument.** The symmetrization argument uses (1) an independent copy of , denoted as , and (2) independent Rademacher random variables, with . Denote . Let us display the symmetrization argument:

The last quantity is called the Rademacher complexity of w.r.t. . In step and , we uses the fact that and has the same distribution. In step , we use the fact that and are independent so that we could take the expectation w.r.t. first (or consider as ). In step , we use the triangle inequality. Next, we explain the steps and .

**Step** . Here in the step , we use the fact that for each , we have

Taking the expectation over gives

After taking a superium over the left-hand side of the above over first and then taking an expectation over , we conclude the step .

**Step** . Step looks puzzling (to me) at first. A naive reasoning of is that has the same distribution as . I was not convinced by this argument since we are taking a superium over .

Denote “equal in distribution” as . The things we are (or I am) missing is that there are many s in . Hence, knowing that for each ,

is *not* enough. Let us first assume is fixed. What we really need is to ensure that for any fixed , the joint distributions of the following two are the same,

To argue (3), first, for each , we have

by noting that Hence, because of independence of , we know that

This establishes the equality (3) for each fixed . If uniformly, then we see (3) continues to hold by using the law of total probability: for any measurable set

Here in , we use the law of total probability by splitting the event according to the value take. In , we replace the value of by since this is the condition in the conditional probability. In , we use the fact that is independent of . In , we use (3) for fixed . Thus, the equality (3) holds for the case being random. This means for any measurable function ,

.

Step in the symmetrization argument now follows by taking an appropriate .

**Remark.** To deal with the case that is infinite, we may set

and define

Then our previous argument continues to work.