Generating functions and power law fat tails

Cumulant generating functions and distribution of empirical averages

Introduction

Common wisdom tells us that "the sum of a large number of uncorrelated variables is Gaussian distributed". We investigate here conditions for this Central Limit Theorem (CLT) to hold, and discuss its interpretation in terms of the scaling of the fluctutations of such sums.

Distribution with power-law tails

A law that may be used as a benchmark for distributions whose momenta are not all defined is the (normalized) power law:
$\pi_\alpha(\xi) =\alpha/\xi^{\alpha+1} \quad\quad\text{for}\quad \xi\in[1,+\infty[$
For α > 2, one checks that the mean μ and the variance σ2 are well defined:
$\mu=\langle\xi\rangle=\int d\xi\,\xi\pi_\alpha(\xi) = \frac{\alpha}{\alpha-1} \quad\text{and}\quad\sigma^2=\langle\xi^2\rangle_c= \int d\xi\,\xi^2\pi_\alpha(\xi)-\mu^2=\frac{\alpha}{(\alpha-2)(\alpha-1)^2} \;.$
One checks however that for 1 < α < 2, the mean μ is still defined but the variance σ is infinite.

Cumulant generating function

For any real random variable ξ of distribution π(ξ) one defines the cumulant generating function
$\psi_\xi(s)=\log\big\langle e^{-s\xi}\big\rangle\:.$
If existing, its Taylor expansion around 0 defines the cumulants
$\langle\xi^k\rangle_c$
as follows:
$\psi_\xi(s)=\sum_{k\geq 1} \frac{(-1)^k}{k!}s^k \langle\xi^k\rangle_c \quad\text{or equivalently}\quad \langle\xi^k\rangle_c = (-1)^k\frac{\partial^k}{\partial^k s}\psi_\xi(s)\:.$

Even if all the cumulants do not exist, the function ψξ(s) may still be well defined around 0 (it can be non-analytic) if π(ξ) decreases fast enough at large ξ. The first and second cumulant are easy to express in terms of the momenta:
$\langle \xi\rangle_c=\langle \xi\rangle \quad\text{and}\quad \langle \xi^2\rangle_c=\langle \xi^2\rangle-\langle \xi\rangle^2\:.$
It is possible to determine the higher order cumulants but there is not direct interpretation of the result.

There are two simple situations where the cumulant generating function is easy to compute:
• If the distribution is a Dirac delta of mean μ:
$\pi(\xi)=\delta(\xi-\mu) \quad\Longleftrightarrow\quad \psi_\xi(s)=-s\mu$
• If the distribution is a normalized Gaussian of mean μ and variance σ2:
$\pi(\xi)=\frac{e^{-\frac 12 (\xi-\mu)^2/\sigma^2}}{\sqrt{2\pi\sigma^2}} \quad\Longleftrightarrow\quad \psi_\xi(s)=-s\mu+\frac 12 \sigma^2 s^2$
and this is a characterization of the Gaussian distributions (all cumulants of order >2 are zero).

The cumulant generating function verify an important property of linearity: for two independent random variables ξ1 and ξ2, one has
$\Big\langle e^{-s(\lambda_1\xi_1+\lambda_2\xi_2)}\Big\rangle = \Big\langle e^{-s\lambda_1\xi_1}\Big\rangle\Big\langle e^{-s\lambda_2\xi_2}\Big\rangle$
which implies the linearity property of the cumulant generating function
$\psi_{\lambda_1\xi_1+\lambda_2\xi_2}(s)=\psi_{\xi_1}(\lambda_1s)+\psi_{\xi_2}(\lambda_2s)\:.$

Distribution of the empirical average

One defines the empirical average XN of N independent instances of a random variable ξ drawn from the same distribution π(ξ) as
$X_N=\frac 1N \sum_{i=1}^N \xi_i$
A simple case of the Central Limit Theorem determines the fluctuations of the empirical average XN.

Case 1: when the second moment exists

We assume that the first and second cumulants (or equivalently, moments) of ξ exist. This correspond to the case α > 2 for the power law distribution πα(ξ) defined above. The cumulant generating function of ξ is defined at least up to order 2 in s:
$\psi_\xi(s)=-s\mu+\frac 12 \sigma^2 s^2 + o(s^2) \quad\text{and thus by the linearity property}\quad \psi_{X_N}(s) = N \psi_\xi(s/N) = -s\mu + \frac 12 \sigma^2 \frac{s^2}N + o(s^2/N)$
This shows that at the infinite size limit,
$\psi_{X_N}(s) = -s\mu$
and the probability distribution function of XN is a delta function around μ. The fluctuations around this average are determined from the next order of the expansion. The quadratic form of
$\psi_{X_N}(s)$
show that to the next order, XN has a Gaussian distribution of mean μ and variance σ2/N. This is the Central Limit Theorem.

Case 2: when the second moment does not exist but the first does

In that situation the function ψξ (s) can't be expanded as previously. We focus on the case when the distribution is the power law πα(ξ) with 1 < α < 2. Standard asymptotic analysis shows in that case that
$\psi_\xi(s)=-s\mu+ \kappa_\alpha\frac {s^\alpha}{\Gamma(1+\alpha)} + O(s^2)$
where the generalised cumulant of order α is
$\kappa_\alpha = -\Gamma(1 - \alpha)^2$
and Γ(z) is the Euler Gamma function. For another distribution π(ξ) with different form but same power law tail, the expansion is the same with different value for κα. In any case, by the linearity property, one has
$\psi_{X_N}(s) = N \psi_\xi(s/N) = -s\mu+ \kappa_\alpha\frac {s^\alpha}{N^{\alpha-1}\Gamma(1+\alpha)} + O(s^2/N)$

Interpretation in terms of the scaling of fluctuations

In the Gaussian case one sees that, in the large N limit, the empirical average XN takes the form
$X_N = \mu + N^{-\frac 12} Y$
where μ is constant and Y has fluctuations of order 1. Indeed the distribution of
$Y=(X_N-\mu) N^{\frac 12}$
is Gaussian of zero mean and variance σ2, a distribution independent of N.

We now would like to determine the scaling of the fluctuations ofXN around μ in the case 1 < α < 2, which are not of order N-1/2. To do so, one remarks that
$X_N = \mu + N^{-\gamma} Y \quad\Longrightarrow\quad \big\langle e^{-s X_N}\big\rangle = e^{-s\mu} \big\langle e^{-sN^{-\gamma}Y}\big\rangle \quad\Longrightarrow\quad \psi_{X_N}(s) = -s\mu + F(sN^{-\gamma})$
In other words the dependence of the non-linear terms in
$\psi_{X_N}(s)$
in s and N is made only through a function of sN −γ. In the case 1 < α < 2, by writing
$\psi_{X_N}(s)=-s\mu+ \frac{\kappa_\alpha}{\Gamma(1+\alpha)} \Big(\frac{s}{N^{1-1/\alpha}}\Big)^{\alpha} + O(s^2/N)$
one finds that γ = 1−1/α and thus the scaling of XN is
$X_N = \mu + N^{1-\frac 1\alpha} Y$
The distribution of Y is not trivial. It belongs to the class of Lévy alpha-stable distributions. Thanks to the last program of Class Session 04: Errors and fluctuations , one can evaluate it numerically as follows: