Random matrices and their rank

In this post, we study a few probability one results conerning the rank of random matrices: with probability one,

  1. a random asymmetric matrix has full rank;
  2. a random symmetric matrix has full rank;
  3. {\frac{n(n+1)}{2}} many random rank one positive semidefinite matrices (belonging to {\mathbb{R}^{n\times n}}) are linearly independent.

At least for the first two, they look quite natural and should be correct at first glance. However, to rigorously prove them does require some effort. The post is trying to show that one can use the concept of independence to prove the above assertions. Indeed, all the proof follows an inductive argument where independence makes life much easier:

  1. Draw the first sample, it satisfies the property we want to show,
  2. Suppose first {k} samples satisfy the property we want to show, draw the next sample. Since it is independent of the previous {k} samples, the first {k} samples can be considered as fixed. Then we make some effort to say the first {k+1} sample does satisfy the property.

Before we going into each example, we further note for any probability measure that is absolutely continuous respect to the measure introduced below and vice versa (so independence is not needed), the probability {1} results listed above still hold.

1. Random Matrices are full rank

Suppose {A\in \mathbb{R}^{n\times m}} with {A=[a_{ij}]} and each entry of {A } is drawn independently from standard Gaussian {N(0,1)}. We prove that {A} has rank {\min\{m,n\}} with probability {1}.

To start, we should assume without loss of generality that {m\leq n}. Otherwise, we just repeat the following argument for {A^\top}.

The trick is to use the independence structure and consider columns of {A} are drawn from left to right. Let’s write the {i}-th column of {A} by {a_{\cdot i}}

For the first column, we know {a_{\cdot 1}} is linearly independent because it is {0} with probability {0}. For the second column, because it is drawn independently from the first column, we can consider the first column is fixed, and the probability of {a_{\cdot 2}} fall in to the span of a fixed column is {0} because {a_{\cdot 2}} has a continuous density in {\mathbb{R}^n}. Thus the first two columns are linearly independent.

For general {k\leq n}, because {m\leq n}, we know the first {k-1} columns forms a subspace in {\mathbb{R}^n} and so {a_{\cdot k}} falls into that subspace with probability {0} (linear subspace has Lebesgue measure {0} in {\mathbb{R}^n} and Gaussian measure is absolutely continuous with respect to Lebesgue measure and vice versa). Thus we see first {k} columns are linearly independent. Proceeding the argument until {k=m} completes the proof.

2. Symmetric Matrices are full rank

Suppose {A\in \mathbb{S}^n\subset \mathbb{R}^{n\times n}} with {A=[a_{ij}]} and each entry of the upper half of {A }, {a_{ij},j\geq i}, is drawn independently from standard Gaussian {N(0,1)}. We prove that {A} has rank {n}.

Let’s write the {i}-th row of {A} by {a_{i\cdot}\in \mathbb{R}^{1\times n}}. For the first {k} entries of the {i}-th row, we write {a_{i,1:k}\in \mathbb{R}^{1\times k}}. Similarly, {a_{1:k,i}} denotes the first {k} entries of the {i}-th column. For the top left {k\times k} submatrix of {A}, we denote it by {a_{1:k,1:k}\in \mathbb{R}^{k\times k}}. Similar notation applies to other submatrices.

The idea is drawing each column of {A} from left to right sequentially and using the independence structure. Starting from the first column, with probability one, we have that {a_{\cdot 1}} is linearly independent (meaning {a_{\cdot 1}\not =0}).

Now since {a_{\cdot 1}} is drawn and the second column except the first entry is independent of {a_{\cdot 1}}, we know it is in the span of {a_{\cdot 1}} if the second column {a_{\cdot 2}} is {\frac{a_{12}}{a_{11}}} ({a_{11}} is {0} with probability 0) multiple of {a_{\cdot 1}}. This still happens with probability {0} since each entry of {a_{\cdot2}} has continuous probability density and is drawn independently from {a_{\cdot 1}}.

Now assume we have drawn the first {k} columns of {A}, the first {k} columns are linearly independent with probability {1}, and the left top {k\times k} submatrix of {A} is invertible (the base case is what we just proved). We want to show after we draw the {k+1} column, the first {k+1} columns are still linearly independent with probability {1}. Since the first {k} columns is drawn, the first {k} entries of the {k+1} column {a_{\cdot (k+1)}} is fixed, we wish to show the rest {n-k} entries which are drawn independently from all previous {k} columns will ensure linearly independence of first {k+1} columns.

Suppose instead {a_{\cdot (k+1)}} is linearly dependent with first {k} columns of {A}. Since the left top {k\times k} submatrix of {A} is invertible, we know there is only one way to write {a_{\cdot (k+1)}} as a linear combination of previous columns. More precisely, we have

\displaystyle a_{\cdot (k+1)} = a_{1:n,1:k}a_{1:k, 1:k}^{-1}a_{k+1,1:k}^\top.

This means {a_{\cdot (k+1)}} has to be a fixed vector which happens with probability {0} because the last {n-k} entries of {a_{\cdot (k+1)}} is drawn independently from {a_{\cdot i}}, {i=1,\dots ,k.} Note this implies that the first {k+1} columns of {A} are linearly independent as well as the left top {(k+1) \times (k+1)} submatrix of {A} is invertible because we can repeat the argument simply by thinking {A} is {(k+1) \times (k+1)} (instead of {n\times n}). Thus the induction is complete and we prove {A} indeed has rank {n} with probability {1}.

3. Rank {1} positive semidefinite matrices are linearly independent

This time, we consider {\mathbf{a}_i}, {i=1,\dots, \frac{n(n+1)}{2}} are drawn independently from standard normal distribution in {\mathbb{R}^n}. Denote the set of symmetric matrices in {\mathbb{R}^{n\times n}} as {\mathbb{S}^n}. We show that with probability {1} the matrices

\displaystyle A_i = \mathbf{a}_i\mathbf{a}_i^\top, \quad i =1,\dots , \frac{n(n+1)}{2}

are linearly independent in {\mathbb{S}^n}.

Denote the standard {i}-th basis vector in {\mathbb{R}^n} as {e_i}. First, let us consider the following basis (which can be verified easily) in {\mathbb{S}^n},

\displaystyle E_{ii} = e_ie_i, \quad i=1,\dots,n,\,\text{and}\quad E_{ij} = e_ie_i +e_{j}e_j + e_ie_j +e_je_i, 1\leq i<j\leq n.

We order the basis according to the lexicographical order on {(i,j)}. Suppose {\mathbf{a}} follows standard normal distribution in {\mathbb{R}^n}, we prove that for any fixed linear subspace {V\subset \mathbb{S}^{n^2}} with dimension less than {\frac{n(n+1)}{2}}, we have

\displaystyle \mathop{\mathbb P}(\mathbf{a}\mathbf{a} ^\top \in V ) = 0.

We start by applying an fixed invertible linear map {\mathcal{F}:\mathbb{S}^n \rightarrow \mathbb{S}^n} to {V} so that the basis of {\mathcal{F}(V)} is the first {\dim(V)} many basis vector of {\{E_{ij}\}_{i\leq j}} ({in lexicographical order.}), and the basis of the orthogonal complement (defined via the trace inner product) of {V}, {V^\perp\subset \mathbb{S}^n}, is mapped to the rest of the basis vector of {\{E_{ij}\}_{i\leq j}} under {\mathcal{F}}. We then only need to prove

\displaystyle \mathop{\mathbb P}(\mathcal{F}(\mathbf{a}\mathbf{a}^\top) \in\mathcal{F}(V)) =0.

We prove by contradiction. Suppose {\mathop{\mathbb P}(\mathcal{F}(\mathbf{a}\mathbf{a}^\top) \in\mathcal{F}(V)) >0}. It implies that there exists infinitely many {\mathbf{a}\in \mathbb{R}^n} such that {\mathcal{F}(\mathbf{a}\mathbf{a}^\top) \in\mathcal{F}(V)}. Moreover, each component of {\mathbf{a}} takes infinitely many values. We show such situation cannot occur. Denote the {\dim (V)}-th largest element in {\{(i,j)\}_{i\leq j}} as {(i_0,j_0)}. Since {V} has dimension less than {\frac{n(n+1)}{2}}, {\mathcal{F}(V^\top)} contains nonzero element and so

\displaystyle \mathcal{F}(\mathbf{a}\mathbf{a}^\top) \in \mathcal{F}(V) \iff [\mathcal{F}(\mathbf{a}\mathbf{a}^\top)]_{ij}=0,\quad \forall (i,j) > (i_0,j_0) \,\text{in lexicographical order.}

Now fix an {(i_1,j_1)>(i_0,j_0)}, we know { [\mathcal{F}(\mathbf{a}\mathbf{a}^\top)]_{i_1j_1}=0} means that there is some {F\in \mathbb{S}^n} (depending only on {V}) such that

\displaystyle \mathbf{tr}(F\mathbf{a}\mathbf{a}^\top) = \sum_{1\leq i,j\leq n} F_{ij} a_ia_j =0,

where {a_i} is the {i}-th complement of {\mathbf{a}}. Now if we fix {a_i, i=2,\dots,n} and only vary {a_1}, then we have

\displaystyle \sum_{1\leq i,j\leq n} F_{ij} a_ia_j =F_{11}a_1^2 +2 (\sum_{1<j\leq n}F_{1j}a_j)a_1 + \sum_{i\not =1,j\not =1}F_{ij}a_ia_j=0,

which holds for infinitely many {a_i \in \mathbb{R}} only if

\displaystyle F_{ii}=0, \quad (\sum_{1<j\leq n}F_{1j}a_j)=0,\quad\text{and} \sum_{i\not =1,j\not =1}F_{ij}a_ia_j =0.

Thus we can repeat the argument as before and conclude that {\mathbf{tr}(F\mathbf{a}\mathbf{a}^\top) = \sum_{1\leq i,j\leq n} F_{ij} a_ia_j =0,} holds for infinitely many {\mathbf{a}} with each component taking infinitely many values only if {F=0}. This contradicts the fact that {\mathcal{F}} is invertible and hence we must have

\displaystyle \mathop{\mathbb P}(\mathcal{F}(\mathbf{a}\mathbf{a}^\top) \in\mathcal{F}(V)) =0.

Now proving {A_i=\mathbf{a}_i \mathbf{a}_i^\top} spans {\mathbb{S}^n} with probability {1} is easy. Because for each {k}, the {A_i} with {i<k} being drawn can be considered fixed because {A_k} is drawn independently of {A_i}. But the probability of {A_k} stays in the span of {A_i,i<k} is {0} by previous result and hence {A_k} is linearly independent of all {A_i}. This argument continues to hold for {k= \frac{n(n+1)}{2}} because the space spanned by {A_1,\dots, A_{\frac{n(n+1)}{2}-1}} is at most {\frac{n(n+1)}{2}-1} and so previous result applies. This completes the proof.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s