# Natural sufficient statistic is not necessarily minimal for exponential family

This post gives a simple example of an exponential family that has natural parameter space being a point and that its natural sufficient statistic is not minimal.

Let us define the concept of exponential family with natural parameters.

Definition 1 (Natural exponential family)
A family of probability densities (or probability mass function) ${f(\mathbf{x}|\mathbf{\eta} )}$ with parameter (index) ${\mathbf{\eta}\in {\mathbb R}^k}$ is said to be a natural exponential family if ${f}$ can be written as $\displaystyle f(\mathbf{x}|\mathbf{\eta}) = h(\mathbf{x}) \exp \left\{ \sum_{i=1}^k \mathbf{\eta}_i T_i(\mathbf{x}) - \mathcal{A}(\mathbf {\eta})\right\}. \ \ \ \ \ (1)$

Here for ${i = 1,\ldots,k}$, we call

• ${T_i}$: natural sufficient statistic
• ${\eta_i}$: natural parameter
• ${H := \{ \mathbf{\eta} = (\eta_1,\ldots,\eta_k): \int h(\mathbf{x}) \exp \left\{ \sum_{i=1}^k \eta_i T_i(\mathbf{x})\right\} \, d \mathbf{x} < \infty \} }$: natural parameter space.

A lot of well-known distributions actually are exponential families, e.g., normal distribution, Binomial distribution, Poisson distribution, beta distribution, and gamma distribution. The exponential family is central to the modeling and analysis of data.

Next, we define the concept of sufficiency and minimal sufficiency.

Definition 2 (Sufficiency and minimal sufficiency)
For a family of probability density ${f_\eta}$, ${\eta \in \Theta \subset {\mathbb R}^k}$, and a random variable ${X\sim f_\eta}$, a statistic ${T(X)}$
is sufficient if the conditional distribution ${P(X|T(X))}$ is independent of ${\eta}$. A sufficient statistic ${T(X)}$ is minimal if for any other sufficient statistic ${S(X)}$, there is a function ${g}$ such that ${T(X)=g(S(X))}$.

Intuitively, a sufficient statistic captures all the information of the underlying parameter ${\eta}$. Indeed, suppose someone hands you a sufficient statistic ${T(X)}$. Because ${P(X|T(X))}$ is independent of ${\eta}$, you know the distribution ${P(X|T(X))}$ already. Now if you can generate the data ${X}$ according to ${P(X|T(X))}$, then the unconditional distribution of ${X}$ is simply ${f_\theta}$! So even though you don’t know the underlying distribution ${f_\theta}$, you can generate ${X}$ so long as ${T(X)}$ is available.

The minimality of a sufficient statistic means the data is reduced in an optimal way. As all other sufficient statistics actually contain more information than needed. One thing to note is that this definition of minimality has nothing to say about the dimension of a minimal sufficient statistic. Indeed, if ${T(X)}$ is minimal, then ${(T(X),1,2)}$ is
also minimal.

It is easily verified that natural sufficient statistic is actually sufficient using the factorization theorem. A natural question occurs at this point, is the natural sufficient statistic always minimal? The following example reveals that we do need to put a few more condition on ${\eta}$ or ${T(x)}$.

Example 1
Consider the density family $\displaystyle f(x,y\mid \eta) = \frac{1}{(1+x^2)(1+y^2)}\exp\left (\eta \left (x^2-y^2\right)-A(\eta)\right).$

where ${x,y\in {\mathbb R}}$.
The natural parameter space is the place where the integeral $\displaystyle \int_{{\mathbb R}^2} \frac{ \exp\left(\eta \left(x^2-y^2\right)\right)}{(1+x^2)(1+y^2)} dxdy = \int_{\mathbb R}\frac{\exp (\eta x^2)}{1+x^2}dx \int_{\mathbb R} \frac{\exp (-\eta y^2)}{1+y^2}dy$

is finite (we use Fubini’s theorem in the middle step). Hence the parameter ${\eta}$ needs to satisfy that ${\eta \leq 0 }$ for the integral ${\int_{\mathbb R}\frac{\exp (\eta x^2)}{1+x^2}dx}$ to be finite and ${\eta\geq 0}$ for the integral ${\int_{\mathbb R} \frac{\exp (-\eta y^2)}{1+y^2}dy}$ to be finite. This means actually ${\eta=0}$ and so the natural parameter space is a single point ${\{0\}\subset {\mathbb R}}$.

The natural sufficient statistic ${X^2-Y^2}$ is indeed sufficient. But any constant estimator is also sufficient and minimal as any other sufficient statistic under a constant function is a constant. But ${X^2-Y^2}$ is not a constant estimator so we see the natural sufficient statistic is not always necessarily minimal.

It can be shown that so long as the natural parameter space contains an open set then the natural sufficient is indeed minimal. See Theorem 4.5 b) of this note