\documentclass{article}
\usepackage{epsfig}
\newenvironment{code}{\begin{verse}
\tt $\{$ \\
}{$\}$ \\
\end{verse}}
\def\latex{\LaTeX}
\def\N{{\rm I\!N}} %natuerliche Zahlen
\newcommand{\baseref}{{\it http://omega.stat.psu.edu:8008/summer99/lecture4}}
\newcommand{\href}[2]{{#2 {\center ({\it \baseref/#1})}}}
\newtheorem{mytheorem}{Theorem}
\newtheorem{lemma}{Lemma}
\title{Lecture IV}
\begin{document}
\begin{abstract}
The Rejection and Acceptance Complement Methods.
\end{abstract}
\section*{ Exact Sampling Methods}
The Metropolis algorithm and its variants are great, for they can be used to
generate ergodic Markov chains with, in principle, any pre specified
stationary distribution. Just choose an arbitrary starting point and
eventually the chain will begin sampling from its stationary distribution.
The problem is, of course, that (usually) we can't be 100\% sure about when
this will happen. Exact sampling methods, when available, are therefore more
reliable and, often more efficient than MCMC methods. For this reason MCMC
sampling is almost always used in combination with exact methods. This is
specially true for the Gibbs sampler where we need to generate from the
full conditionals. What exact method to use depends on the particular
problem. There are literally hundreds of methods available. The current
bible is \href{http://cgm.cs.mcgill.ca/~luc/}{Devroye's} book which is
unfortunately not available online. Besides the only truly universal
exact generator (the inverse cdf method introduced in
\href{../lecture1/l1.html}{lecture I}) the most flexible exact algorithm is
the rejection method.
\section*{ The Rejection Method }
The idea, I think first implemented by Von Neumann, is very simple. To
sample from $f(x)$ find another simpler density $g(x)$ from which you
know how to sample and such that $c g(x) \ge f(x)$ for some $c > 1$ (we
say that $c g$ envelopes $f$). Like in the following picture:
\begin{center}
\epsfysize=2.0in
\epsfbox{env90.ps}
\end{center}
Then generate uniformly under the graph of
the envelope and accept only samples that fall under the graph of $f$.
In other words reject a sample if it falls in between the envelope
and the function $f$.
Copying from the bible:
\begin{mytheorem}
If $f$ and $g$ are densities on $R^{p}$ with, $f(x) \le c g(x)$ for
all $x\in R^{p}$, some $c \ge 1$. Then the following algorithm
\subsubsection*{The Rejection Method}
\begin{code}
REPEAT \\
\begin{code}
$x \leftarrow$ sample from $g$ \\
$u \leftarrow$ unif(0,1) \\
$y \leftarrow c \frac{g(x)}{f(x)}$ \\
\end{code}
UNTIL $uy \le 1$ \\
RETURN x
\end{code}
will produce a sample $X=x$ with density $f(x)$.
\end{mytheorem}
{\bf Proof:} Consider the following two lemmas
\begin{lemma}
Let $X$ be a random $p$-vector with density $f(x)$. Let $U$ be a
uniform(0,1) r.v. independent of $X$, and let $c > 0$. Then,
\begin{description}
\item[I.] $(X,c U f(X))$ is uniform on,
$ A = \{(x,u) : x\in R^{p}, 0\le u \le c f(x) \} $
\item[II.] If $(X,U)$ is uniform on $A$, then $X$ has density $f(x)$.
\end{description}
{\bf Proof:}
\begin{description}
\item[I.] Let $B\subset A$, then
\begin{eqnarray*}
P[(X,c U f(X))\in B ] & = & \int P[(x,c U f(x))\in B | X=x ] f(x) dx \\
& = & \int \left\{ \int_{B_{x} = \{u:(x,u)\in B\}}
\frac{1}{c f(x)} du \right\} f(x) dx \\
& = & \frac{1}{c}\int_{B} du dx
\end{eqnarray*}
where the second equality follows from Tonelli's theorem, the independence
of $X$ and $U$ and the fact that $cf(x)U$ is uniform$(0,cf(x))$.
But,
\[ |A| = \int_{A} du dx = \int_{R^{d}}(\int_{0}^{cf(x)}du)dx = c \]
thus, for any $B \subset A$ measurable we have,
\[ P\left( (X,cUf(X)) \in B \right) = \frac{|B|}{|A|} \]
which means that $(X,cUf(X))$ is uniform on $A$.
\item[II.] We just need to show that, $\forall B$ measurable,
$P[X\in B] = \int_{B} f(x) dx$. But,
\begin{eqnarray*}
P[X\in B] &=& P[ X\in B, 0\le U \le c f(X) ] \\
&=& P[(X,U)\in B_{1}=\{(x,u):x\in B, 0\le u \le cf(x)\}] \\
&=& \frac{\int\int_{B_{1}} du dx}{\int\int_{A} du dx} \\
&=& \frac{1}{c}\int_{B} c f(x) dx = \int_{B} f(x) dx.
\end{eqnarray*}
\end{description}
\end{lemma}
\begin{lemma}
Let $X_{1},X_{2},\ldots$ be iid vectors in $R^{p}$. Let $A\subset R^{p}$ s.t.
$P[X\in A] = \alpha > 0$. Let $Y$ be the first $X_{i}\in A$. Then,
$\forall B\subset R^{p}$ measurable,
\[ P[Y\in B] = \frac{P[X_{1}\in A\bigcap B]}{\alpha}. \]
Moreover, if $X_{1}$ is uniform on $A_{0} \supset A$ then $Y$ is uniform on
$A$.
\end{lemma}
{\bf Proof:}
\begin{eqnarray*}
P[Y\in B] &=& \sum_{i=1}^{\infty} P[X_{1}\not\in A,\ldots, X_{i-1}\not\in A,
X_{i}\in B\bigcap A ] \\
&=& \sum_{i=1}^{\infty} (i-\alpha)^{i-1} P[X_{1}\in B\bigcap A] \\
&=& \frac{P[X_{1} \in B\bigcap A)}{1-(1-\alpha)}
\end{eqnarray*}
Also, if $X_{1}$ is uniform on $A_{0} \supset A$ then for all measurable $B$,
\begin{eqnarray*}
P[Y\in B] &=& \frac{P[X_{1}\in ABA_{0})}{\alpha} =
\frac{|ABA_{0}| / |A_{0}|}{|A_{0}A| / |A_{0}|} \\
&=& \frac{|AB|}{|A|}
\end{eqnarray*}
therefore, $Y$ is uniform on $A$.
We are now ready for the Theorem.
{\bf Proof:} (that the Rejection Method is valid). In the first line
the algorithm generates (by Lemma 1, I) $(X,cUg(X))$ uniform on
$C = \{(x,u): x\in R^{p}, 0\le u \le c g(x) \}$. However at
exit time (by Lemma 2), $(X,cUg(X))$ is uniform on
$C = \{(x,u): x\in R^{p}, 0\le u \le f(x) \}$.
Notice that, by the assumption that $f(x)\le cg(x)$
(i.e. that $g$ envelopes $f$), $C \supset A$.
Thus, (by Lemma 1, II) $X$ has density $f(x)$.
\subsubsection*{Best $g$ is $f$ itself}
Let $N$ be the number of pairs $(x,u)$ generated by the rejection
algorithm to exit. We have,
\begin{eqnarray*}
P[ N \ge i ] &=& P[\mbox{ reject at least $(i-1)$ pairs} ] \\
&=& P[ UY \ge 1 ]^{i-1} = (1-\alpha)^{i-1}
\end{eqnarray*}
where,
\begin{eqnarray*}
\alpha &=& P[ U c \frac{g(X)}{f(X)} \le 1 ] =
\int P[Ucg(x)\le f(x)] g(x)] dx\\
&=& \int \frac{f(x)}{c g(x)} g(x) dx = \frac{1}{c}.
\end{eqnarray*}
The expected number of pairs generated by the algorithm is, $$ where,
\[ = \sum_{i=1}^{\infty} P[N\ge i] = \sum_{i=1}^{\infty} (1-\alpha)^{i-1}
= \frac{1}{1-(1-\alpha)} = \frac{1}{\alpha} = c \]
Thus, $c$ is the expected number of rejections and therefore it should be
kept as small as possible, i.e. as close as possible to its minimum value
of 1. Since, $f(x) \le c g(x)$, in order for $c$ to be small $g$ must be
close to $f$.
{\bf Example:} To sample from the standard Gaussian we can use the
rejection method with $g$ as the Laplace distribution. Notice that,
\[ f(x) = (2\pi)^{-1/2} \exp(-x^{2}/2) \]
and in order to get an upper bound for $f$ we need a lower bound
for the energy, i.e. for $x^{2}/2$. But that follows easily from,
\[ \frac{1}{2}(|x| - 1)^{2} = \frac{x^{2}}{2} + \frac{1}{2} - |x| \ge 0 \]
from where we get,
\[ \exp(-x^{2}/2) \le \exp(\frac{1}{2} - |x|) =
(2\pi)^{1/2} \left(\frac{2e}{\pi}\right)^{1/2}
\left(\frac{1}{2}\exp(-|x|)\right) \]
the last term in parenthesis above, is the density of the Laplace
distribution which the above inequality shows to envelope the $N(0,1)$
with,
\[ c = \left(\frac{2e}{\pi}\right)^{1/2} \approx 1.3155 \]
{\bf Javascript Demo} Implementation of the above in Javascript.
\href{kyang-1.html}{Solution by Ke Yang}
There are several variations of the rejection method in the bible.
One fairly general method that never rejects any samples is:
\subsection*{The Acceptance Complement Method}
Suppose that we don't know how to envelope $f$ but,
\[ f(x) = f_{1}(x) + f_{2}(x) \]
with, $f_{1}(x) \ge 0, f_{2}(x) \ge 0$ and $f_{1}(x) \ge g(x)$ where
$g$ is a density. Furthermore suppose that we know how to sample from
$g$ and from $f_{2}$ (properly normalized), then,
\subsubsection*{The Acceptance Complement Method}
\begin{code}
$x \leftarrow $ sample from g \\
$u \leftarrow $ unif(0,1) \\
IF $u > f_{1}(x) / g(x)$ THEN $x \leftarrow $ sample from $f_{2}/\int f_{2}$ \\
RETURN x \\
\end{code}
\begin{mytheorem}
$X=x$ is a sample from $f$.
\end{mytheorem}
{\bf Proof:} Let $\alpha = \int f_{2}$ and suppose that Y has density $g$. We
have,
\begin{eqnarray*}
P[X\in B] &=& P[Y\in B, U\le \frac{f_{1}(Y)}{g(Y)}] +
P[U > \frac{f_{1}(Y)}{g(Y)}] \int_{B} f_{2}(x) dx / \alpha \\
&=& \int_{B} \frac{f_{1}(x)}{g(x)} g(x) dx + \left( 1 -
\int \frac{f_{1}(y)}{g(y)} g(y) dy\right)\int_{B}f_{2}/\alpha \\
&=& \int_{B} f_{1} + \int_{B} f_{2} = \int_{B} f
\end{eqnarray*}
{\bf Example:} $f_{1}(x) = \min\{f(x),g(x)\}$ and $f_{2}(x) = (f(x)-g(x))_{+}$.
Then, clearly $f_{1}(x) \le g(x)$ and $f(x) = f_{1}(x) + f_{2}(x)$. So if
we know how to sample from $g$ and $f_{2}$, we are done.
Another useful general split is available for {\it almost flat densities} on
$[-1,1]$, i.e. densities $f(x)$ such that,
\[ \sup f(x) - \inf f(x) \le \frac{1}{2} \]
then we may take,
\begin{eqnarray*}
g(x) &=& 1/2 \mbox{ for } |x| \le 1 \\
f_{1}(x) &=& f(x) - (M - \frac{1}{2}), \mbox{ with } M = \sup f(x) \\
f_{2}(x) &=& M - \frac{1}{2} \mbox{ for } |x| \le 1
\end{eqnarray*}
which works since $g$ and $f_{2}$ are proportional to densities uniform on
$[-1,1]$ (so easy to sample from). That $f_{1}$ is bounded above by $g$ just
follows from the almost flat condition and that $f_{1} > 0$ follows from the
fact that,
\[ 0 \le \inf f(x) \le \frac{1}{2} \le \sup f(x) \le 1 \]
Hence, the method can be used to generate from many densities symmetric about
with a single mode at 0, for which $M=f(0)$ and $\inf\ f = f(1)$. For example
we can generate from the truncated Cauchy for $|x|\le 1$. We have,
\[ f(x) = \frac{2}{\pi (1+x^{2})}\ \mbox{\ for\ } |x| \le 1 \]
which is almost flat since $2/\pi - 1/\pi = 1/\pi \approx 0.318 < 0.5$. Using
the property that $1/X$ is Cauchy when $X$ is Cauchy we can generate a
complete Cauchy by just using the Acceptance Complement Method to generate
from the truncated Cauchy and then with probability $1/2$ return $1/X$
instead of $X$.
{\bf Example:} Javascript implementation of the Cauchy with the above method.
\subsection*{A Trivial Perfect MCMC Method}
If there is available a good envelope for $f$ then the rejection method is
preferable to asymptotic methods based on Markov chains. The problem is that
often good envelopes are not easy to find. A bad envelope can be easier to
find and still be useful when combined with a Markov chain method. For
example it may be possible to show that $f(x) \le 10000 g(x)$ which means
that we can use the rejection method but expect to generate, on the average,
$10000 (X,U)$ pairs before accepting one vector $X$. In this case we can
still use the rejection method to generate a single observation of $X$ which
is warranted to have the correct density $f$. Then, use the observation
generated by the rejection method as the initial point for a Markov chain
method with statitionary distribution $f$ and harvest the complete path of
the chain that is now sampling from its asymptotic distribution.
\end{document}