Lecture I
An Introduction to Markov Chain Monte Carlo
Lecture I
Lecture I
Abstract
Introduction, the basics of Monte Carlo Integration, and
the elements of statistical physics (part 1).
Introduction
add the intro from linux
Integration by Simulation
The Basic Idea
Let x Î R^{p}, g(x) > 0 with
The integral of any real valued integrable function f
can be written as:
I = 
ó õ

f(x) dx = 
ó õ


f(x) g(x)

g(x) dx 

The function g is a probability density. Thus, the last
integral can be read as an expectation with respect to the
pdf g, i.e.,
I = E_{g} 
é ê
ë

f(X) g(X)

ù ú
û



The classical Strong Law of Large Numbers (SLLN) then assures that
for n sufficiently large,
I » 
1 n


n1 å
t = 0


f(X_{t}) g(X_{t})

= I_{n} 

with probability close to one, when X_{0},X_{1},¼ are iid as
X with pdf g. This is just the standard: ``sample mean goes to the
expected value'' kind of statement. Moreover, by the Central Limit
Theorem (CLT),
Z_{n} = 
n^{1/2}(I_{n}  I) s



goes (in Law) to the standard Gaussian distribution, provided that
s (which is the standard deviation of the r.v. Y = f(X)/g(X)), is finite and not zero. The size of s controls
the accuracy of the approximation. Notice, that I_{n} give or
take,
is a random interval with about 95% chance of covering the
value of I. The smaller the value of s, the more accurate is
I_{n} as an estimator of I.
Best g
The best g is then the one that makes s as little as possible.
We have,
s^{2} = E_{g} 
é ê
ë

f(X) g(X)

ù ú
û

2

 I^{2} 

The integral of f, I, is independent of g so the best g is
obtained by solving the variational problem,
over all positive pdfs g, i.e. functions such that, g(x) > 0 and
òg = 1. Using a Lagrange multiplier l for the
normalization constraint, we see that we need to find g and
l solving,

min
 
ó õ


é ê
ë


f^{2}(x) g(x)

+lg(x) 
ù ú
û

dx. 

The EulerLagrange equation for the Lagrangian:
is just the derivative of L w.r.t. g equal 0. This gives,
from where we deduce that the best g is,
where c is a normalization constant (the square root of l
actually). This is clearly a global minimum for s (at least
when f is positive) since,
(s^{*})^{2} = c 
ó õ

f(x) dx  I^{2} 

and this is zero, due to the fact that c = I for positive functions f.
Example
Take,
I = 
ó õ

1
1

x^{2} dx = 0.666¼ 

Best g(x) = 1.5 x^{2} for x in the interval [1,1]. So for this
g, I_{n} = I for all n and the Monte Carlo algorithm is really
not useful. We need to know I to implement the algorithm but the
purpose of the algorithm is to compute I itself!. This is the
problem with the optimal g when f is every where positive.
Nevertheless, by knowing that the optimal g must follow the shape of
f(x) we can tune up g to gain efficiency. For example, g(x) = x, that goes down and up with x^{2}, will be better than
the uniform g(x) = 1/2 in the interval [1,1]. In fact, to obtain
a given accuracy we need to generate more than 6 times as many iterations
with the uniform than with x.
In the implementation of the previous examples we have used the
following fundamental property of random variables for generating
samples from g:
Theorem 1
Let U_{1},U_{2},¼,U_{n} be iid uniform on [0,1] and let
F be a distribution function. Then,
F^{1}(U_{1}), F^{1}(U_{2}),¼, F^{1}(U_{n}) 

are iid with cdf F
Proof

P[F^{1}(U_{1}) £ y_{1}, F^{1}(U_{2}) £ y_{2},¼, F^{1}(U_{n}) £ y_{n}] 


P[ U_{1} £ F(y_{1}),¼, U_{n} £ F(y_{n}) ] 
 

F(y_{1})F(y_{2})¼F(y_{n}) 

 

The first equality follows from the fact that F^{1} is always
nondecreasing and the last equality is just the assumed hypothesis
of iid uniform U_{j}.
The Elements of Statistical Physics
The first Markov Chain Monte Carlo algorithm, (The Metropolis
algorithm) appeared in the statistical physics literature. The aim
was to simulate the evolution of a solid in a heat bath towards
thermal equilibrium.
In this section we introduce the main ideas, and notation from
statistical physics that will be used in the rest of the course.
Statistical Mechanics has been a continuous source of innovative
ideas in mathematics and statistics but it is often not included as
a required course for graduate students in mathematical statistics.
This onehour introduction will help to fill the gap.
From Newton to Hamilton
The formulation of classical mechanics evolved from the original
laws of Newton, the most famous (and computationally most useful) being
the second law:
i.e. Force equals mass times acceleration. The acceleration being the
second derivative with respect to time of the position q, denoted
by,
For a single particle, q denotes its position vector (e.g. (x,y,z)
in Cartesian coordinates) and for a system of particles q denotes
the long vector with all the position coordinates for all the
particles. General field forces are arbitrary vector functions of
position and velocity but all the four fundamental forces of nature
(gravitational, electro magnetic, weak and strong) preserve energy and
are therefore conservative and coming from gradients w.r.t. q of
potential functions V. Thus, we are only interested in
field forces such that,
for some scalar potential function,
Newton's laws tell us that the evolution of a system of particles
is governed by a second order system of differential equations:
For a system of N particles without constraints, there are
3N second order differential equations to be solved. The
theory of differential equations (developed in great part to
understand mechanics) shows that under mild regularity conditions
on the functions F and q(t), the system has a unique
solution for a given set of initial conditions (for example
for initial values of positions and velocities for each particle).
A second order system can always be reduced to a first order system
by duplicating the number of equations. By introducing the momentum
p,
or equivalently,
we can now replace the original system of 3N second order differential
equations by an equivalent first order system of 6N equations in the
variables p and q. Just augment the previous equations with the original
ones (but now written with only first derivatives in terms of q and p),
Hamiltonian Formulation
By introducing the function H(q,p) representing the total energy of
the system, i.e. the sum of kinetic and potential energies,



Energy = Kinetic + Potential 
 

 


 

we obtain Hamilton's equations by replacing the right hand side of the
system above with the derivatives of H:
The Hamiltonian formulation of classical mechanics has proven
extremely useful for the development of modern physics but it contains
as much information as the original laws of Newton. As before, given
the initial conditions of position q(0) and momentum p(0) at time
t = 0, Hamilton's equations predict the past and future of the system
with complete certainty. The problem is that for macroscopic systems
the number of particles N is of the order of Avogadro's number,
and we need to provide of the order of 6*10^{24} initial conditions
and to solve the same number of first order differential equations to
be able to ``see'' the truth implied by the equations of classical
mechanics. The tremendous size of the complexity of this task is the
origin of statistical physics.
File translated from T_{E}X by T_{T}H, version 2.32.
On 5 Jul 1999, 22:59.