04_Integration_Sampling

=Integration and sampling: from the Gaussian integral to Walker's method=

In this lecture, we relate the concepts of sampling and of integration. The connection between the two was not clearly brought out before. We then treat general methods for directly sampling non-uniform distributions. A number of very detailed texts have been written on the subject, see for example.

Going up and down in the Gaussian integral
As an illustration of how sampling relates to integration, we consider the Gaussian integral: math \int_{-\infty}^{\infty} \frac{dx}{\sqrt{2 \pi}} \exp(-x^2/2) = 1 . math

As we learned many years ago, we square this integral, write it down once in //x// and once in //y//:

math \begin{align} \left[\int_{-\infty}^{\infty} \frac{dx}{\sqrt{2 \pi}} \exp(-x^2/2) \right]^2 & = \int_{-\infty}^{\infty} \frac{dx}{\sqrt{2 \pi}} \exp(-x^2/2) \int_{-\infty}^{\infty} \frac{dy}{\sqrt{2 \pi}} \exp(-y^2/2)\\ \ldots & = \int_{-\infty}^{\infty} \frac{dx dy}{2 \pi} \exp[-(x^2+y^2)/2], \end{align} . math and then switch to polar coordinates math dx dy = r dr d\phi , math to reach the following expression for the integral math \begin{align} \ldots & = \int_{0}^{2 \pi} \frac{d\phi}{2 \pi} \int_{0}^{\infty} r dr \exp(-r^2/2), \end{align} . math We may then perform the substitutions math r^2/2=\Upsilon (r dr = d\Upsilon)\quad \text{and}\quad \exp(-\Upsilon) = \Psi , math to reach math \begin{align} \ldots & = \int_{0}^{2 \pi} \frac{d\phi}{2 \pi}\ \int_{0}^{\infty} d\Upsilon \exp(-\Upsilon)\\ \ldots & = \int_{0}^{2 \pi} \frac{d\phi}{2 \pi}\ \int_{0}^{1} d\Psi = 1. \end{align} . math

In this last equation, we can compute the integrals (thus giving the value of the Gaussian integral), but we may in particular sample them: $$\phi$$ is a uniform random variable between 0 and 2 \pi and \Psi is a uniform random variable between 0 and 1. In the below program, we actually sample the random variables \phi and \Psi, and transform these samples in several steps back to //x// and //y//. This procedure carries the name of "sample transformation".

Python program for the Gaussian sample transformation
Note that the substitutions are applied to the samples phi and Psi. This is also known as the [|Box-Muller transform]. code format="python" from random import uniform import math import pylab

def gauss(sigma): phi = uniform(0.0, 2.0 * math.pi) Psi = uniform(0.0, 1.0) Upsilon = - math.log(Psi) r = sigma * math.sqrt(2.0 * Upsilon) x = r * math.cos(phi) y = r * math.sin(phi) return x, y

listx = []; listy = [] for k in range(10000): x, y = gauss(1.0) listx.append(x) listy.append(y) list = listx + listy pylab.figure(1) pylab.hist(list, bins=25, normed=True) pylab.title('Histogram Gaussian') pylab.savefig('gaussians_histogram.png') pylab.figure(2) pylab.title('Correlation x,y') pylab.axis('equal') pylab.plot(listx,listy,'ro') pylab.savefig('gaussians_correlations.png') pylab.show code





Other one-dimensional integrals
We now consider two other examples for the one-dimensional sample transformation. In the first example, we produce samples //x// with 0<//x//<1, distributed according to math \pi(x)= x^\sigma math We write the integrals as math \begin{align} \int_0^1 d\Upsilon &= \text{const} \int_0^1 dx x^\sigma \\ d\Upsilon &= \text{const} dx x^\sigma \\ \Upsilon &= \frac{\text{const}}{\sigma+1} x^{\sigma+1} \\ \Upsilon &= \frac{\text{const}}{\sigma+1} x^{\sigma+1} \\ - \log \exp(-x) & = x \\ \text{ran(0,1)}^{1/(\sigma+1)} &= x \end{align} math It follows that math x= [ \text{ran}[0,1] ]^{1/(\sigma+1)} math is distributed as math \pi(x)= x^\sigma math.

Second example: Exponential random numbers (example already contained in the Gaussian case). \pi(x)= \exp(-x) for 0<x<\infty.

Multi-dimensional Gaussians and the Maxwell distribution (theoretical treatment)
cc code format="python" from mpl_toolkits.mplot3d import Axes3D import pylab, random, math

ax = pylab.gca(projection='3d') ax.set_aspect('equal') x, y, z = [], [], [] for i in range(2000): a, b, c = random.gauss(0.0, 1.0), random.gauss(0.0, 1.0), random.gauss(0.0, 1.0) length = math.sqrt(a ** 2 + b ** 2 + c ** 2) x.append(a / length) y.append(b / length) z.append(c / length) pylab.plot(x, y, z, '.') pylab.savefig('surface_sphere.png') pylab.show

code

cc =Discrete distributions=

Tower sampling (Python program and output)
code format="python" import random import pylab, bisect

def cumulative(l): acc, L = 0, [0] for x in l:       acc += x        L.append(acc) return L

class rand_tower: L = [] def __init__(self,l): self.L = cumulative(l) def call(self): u = random.uniform(0.0, self.L[-1]) return bisect.bisect_right(self.L, u) - 1

def test(n, r): l = [ 1.0/float(i + 1.)**2 for i in range(n)] g = rand_tower(l) samples = [g.call for i in range(r)] pylab.hist(samples, bins=n, range=(-0.5,n - 0.5), normed=True) y_vec = [l[k]/sum(l) for k in range(n)] pylab.plot(range(n),y_vec,'ro') pylab.title('Tower sampling') pylab.savefig('Tower_sampling.png') pylab.show

test(10, 1000000)

code here is output for the above Python program..

Walker's method of alias
Walker's method is an ingenious algorithm for sampling an arbitrary discrete distribution of //N// elements in constant time per sample (after an initialization of complexity //O//(//N//) ).

Walker's method (Naive Python code)
code format="python" from random import uniform, seed,randint import pylab

N = 5 x = [[1.0 / (k + 1), k] for k in range(N)] x_mean = sum(y[0] for y in x) / N for k in range(N): x[k][0] /= x_mean for k in range(N):    print k, x[k][0]  /N x_plus = [] x_minus = [] for z in x:    if z[0]> 1.0: x_plus.append(z)    else: x_minus.append(z) table = [] for k in range(N - 1):    e_plus = x_plus.pop    e_minus= x_minus.pop    table.append((e_minus[0], e_minus[1], e_plus[1]))    e_plus[0] = e_plus[0] - (1.0 - e_minus[0])    if e_plus[0] < 1.0: x_minus.append(e_plus)    else: x_plus.append(e_plus) if len(x_plus) > 0 : table.append((x_plus[0][0], x_plus[0][1], x_plus[0][1])) if len(x_minus) > 0: table.append((x_minus[0][0], x_minus[0][1], x_minus[0][1])) print table

samples = [] for k in range(1000000): x = uniform(0.0, 1.0) i = randint(0, N-1) if x < table[i][0]: samples.append(table[i][1]) else: samples.append(table[i][2]) pylab.figure pylab.hist(samples, bins=N, range=(-0.5, N-0.5), normed=True) pylab.title("Walker's method N= "+str(N)) pylab.show code

=References=