← Another Bunch of Examples → Glossary
We present some additional results from Measure Theory (monotone class theorem), Probability (martingales and CLT), Functional Analysis (maximal functions) and Topology (weak compactness).

Appendix

Monoton Class Theorem

A collection of subsets ${\cal P}\sbe{\cal P}(\O)$ is called a $\pi$-system, if for all $A,B\in{\cal P}$: $A\cap B\in{\cal P}$. A collection of subsets ${\cal L}\sbe{\cal P}(\O)$ is called a $\l$-system if
  1. $\O\in{\cal L}$.
  2. $A,B\in{\cal L}$ and $B\sbe A$ imply: $A\sm B\in{\cal L}$.
  3. $A_n\in{\cal L}$ and $A_n\uar$ imply: $\bigcup A_n\in{\cal L}$.
Dynkin's $\pi$-$\l$-Theorem asserts that if a $\l$-system ${\cal L}$ comprises a $\pi$-system ${\cal P}$, then $\s({\cal P})\sbe{\cal L}$. A typical application is
If ${\cal P}_n$ are independent $\pi$-systems, then the $\s$-algebras they generate are independent as well.
For $A_j\in{\cal P}_j$, $j=2,\ldots,n$ put $$ {\cal L}\colon=\{A\in\s({\cal P}_1):\, \P(A\cap A_2\cap\ldots\cap A_n) =\P(A)\P(A_2)\cdots\P(A_n)\}~. $$ Then ${\cal L}$ is a $\l$-system and by Dynkin's $\pi$-$\l$-Theorem: ${\cal L}=\s({\cal P}_1)$. Iterating the argument proves the assertion.
${\cal L}\sbe{\cal P}(\O)$ is a $\l$-systems if and only if the following holds:
  1. $\O\in{\cal L}$.
  2. $A\in{\cal L}$ implies: $A^c\in{\cal L}$.
  3. If $A_n\in{\cal L}$ and $A_n$ are pairwise disjoint, then: $\bigcup A_n\in{\cal L}$.
Suppose $(\O,\F)$ is a measurable space, $\cal P\sbe\F$ a $\pi$-system and $E$ a vector subspace of bounded measurable functions on $\O$ such that:
  1. For all $A\in\cal P$ we have $I_A\in E$.
  2. If $X_n$ is any increasing sequence of uniformly bounded measurable and non negative functions in $E$, then $\lim_n X_n\in E$.
Then $E$ contains all $\s({\cal P})$-measurable and bounded functions.
$\proof$ Put ${\cal L}\colon=\{A:\,I_A\in E\}$, then this is a $\l$-system and by Dynkin's $\pi$-$\l$-Theorem $E$ contains all simple functions, i.e. finite linear combinations of indicators $I_A$, $A\in\s({\cal P})$. By a result from measure theory every non negative, bounded measurable function is the uniform limit of a sequence of increasing, simple functions; hence $E$ contains all bounded measurable functions. $\eofproof$

Martingales and Submartingales

Let $(\O,\F,\P)$ be a probability space and $\F_n\sbe\F$ a so called filtration, i.e. an increasing sequence of $\s$-algebras. A sequence of random variables $X_n$ is called a martingale (submartingale) if $X_n$ is $\F_n$-measurable and integrable and $$ \E(X_{n+1}|\F_n)=X_n \quad(\E(X_{n+1}|\F_n)\geq X_n)~. $$ Given a filtration $\F_n$ and an integrable random variable $X$, the sequence $X_n\colon=\E(X|\F_n)$ is the standard example of a martingale. We will see that if $X$ is measurable with respect to the $\s$-algebra generated by $\bigcup\F_n$, then $X_n$ converges $\P$-a.s (cf. theorem) and in $L_1(\P)$ (cf. proposition) to $X$.
Suppose $X$ is integrable and $\vp:\R\rar\R$ is convex such that $\vp(X)$ is integrable. Since $\vp$ is convex there is an $\F^\prime$-measurable random variable $Z$ (in case $\vp$ is differentiable $Z=\vp^\prime(\E(X|\F^\prime))$) such that $$ \vp(X)-\vp(\E(X|\F^\prime)) \geq Z(X-\E(X|\F^\prime) $$ For any $A\in\F^\prime$ put $A_n\colon=A\cap[|Z|\leq n]$, then $\E(\vp(X);A_n)\geq\E(\vp(\E(X|\F^\prime));A_n)$ and letting $n\to\infty$ we conclude that $\P$-a.s.: $$ \vp(\E(X|\F^\prime)) \leq\E(\vp(X)|\F^\prime)~. $$ This is the conditional version of Jensen's inequality. Now if $(M_n,\F_n)$ is a martingale, $\vp$ convex such that $\vp(M_n)$ is integrable, then $(\vp(M_n),\F_n)$ is a submartingale: $$ \E(\vp(M_{n+1})|\F_n) \geq\vp(\E(M_{n+1})|\F_n)) =\vp(M_n)~. $$

Upcrossing

Suppose $(X_n,\F_n)$ is a submartingale and $a < b$ are real numbers. Define so called stopping times \begin{eqnarray*} S_1&\colon=&\inf\{n\geq0:\,X_n\leq a\}\\ T_k&\colon=&\inf\{n > S_k:\,X_n\geq b\}\\ S_{k+1}&\colon=&\inf\{n > T_k:\,X_n\leq a\}~. \end{eqnarray*} So $S_1$ is the first time the sequence $X_0,X_1,\ldots$ falls below $a$, $T_k$ is the first time after $S_k$ the sequence oversteps $b$ and $S_{k+1}$ is the first time after $T_k$ the sequence $X_0,X_1,\ldots$ falls below $a$. Finally let us put for $n\geq0$: $$ U_n^{a,b}(X)\colon=\max\{k:\,T_k\leq n\}, $$ i.e. this is up to time $n$ the number of times the sequence $X_0,X_1,\ldots$ has crossed from some value smaller than $a$ to some value larger than $b$ - that's called the number of upcrossings.
For a submartingale $X_n$ we have $$ \E U_n\leq\frac{\E(X_n-a)^+}{b-a} $$
Suppose $X_n$ is a submartingale such that $\sup_n\E X_n^+ < \infty$. Then there exists an integrable random variable $X$, such that $X_n$ converges $\P$-a.s. to $X$.
$\proof$ Since $$ [X_n\mbox{ converges}] =\bigcup_{r1 < r_2\in\Q} [\liminf X_n < r_1 < r_2 <\limsup X_n], $$ it suffices to prove that for all $r_1 < r_2\in\Q$: $$ \P(\liminf X_n < r_1 < r_2 < \limsup X_n)=0. $$ If $\P(\liminf X_n < r_1 < r_2 < \limsup X_n) > 0$, then the sequence $X_0,X_1,\ldots$ upcrosses the interval $[r_1,r_2]$ infinitely often on a set of strictly positive measure. Hence $\E U=\infty$. On the other hand $U\colon=\sup U_n$ is the limit of the increasing sequence $U_n$ and by the Upcrossing Lemma $$ (r_2-r_1)\E U_n\leq\E(X_n-r_1)^+ \leq\sup\E X_n^++|r_1| $$ It follows $U$ is integrable. Consequently $X_n$ must converge $\P$-a.s. to some random variable $X$. Finally $$ \E|X_n| =\E(2X_n^+-X_n) \leq2\sup\E X_n^+-\E X_n~. $$ and thus by Fatou's lemma $X$ must be integrable. $\eofproof$
Suppose $X_n$ is a sequence of random variables on $(\O,\F,\P)$. $X_n$ converges $\P$-a.s. if and only if (suggested solution) $$ \forall\e > 0:\quad \lim_n\P\Big(\sup_{k\geq n}|X_k-X_n| > \e\Big)=0~. $$ 2. If $X_n$ is a sequence on an arbitrary measure space $(\O,\F,\mu)$, then the condition $$ \forall\e > 0:\quad \lim_n\mu\Big(\sup_{k\geq n}|X_k-X_n| > \e\Big)=0~. $$ implies convergence $\mu$-a.e.
In order to verify a.s. convergence it's thus obvious to study the maximal function $X^*\colon=\sup_n|X_n|$.

Doob's Inequalities

For any squence of random variables $X_n$ let us put $$ X_n^*\colon=\sup_{k\leq n}X_k $$
For a submartingal $X_n$ we have: $$ \forall\e > 0:\quad \e\P(X_n^* > \e) \leq\E(X_n;X_n^* > \e) \leq\E X_n^+ $$
$\proof$ Define $T\colon=\inf\{k:X_k > \e\}$, then $[T=k]=[X_{k-1}^*\leq\e,X_k > \e]\in\F_k$ and \begin{eqnarray*} \e\P(X_n^* > \e) &=&\e\P(T\leq n) =\sum_{k=1}^n\e\P(T=k)\\ &\leq&\sum_{k=1}^n\E(X_k;T=k) \leq\sum_{k=1}^n\E(X_n;T=k) =\E(X_n;T\leq n)~. \end{eqnarray*} $\eofproof$
Suppose $\vp:\R\rar\R$ is convex. If $(X_n,\F_n)$ is a martingale such that $\vp(X_n)$ is integrable, then $(\vp(X_n),\F_n)$ is a submartingale.
For a martingale $(X_n,\F_n)$ we have: $$ \forall \e > 0:\qquad\P(|X_n|^* > \e)\leq\E(|X_n|;|X_n|^* > \e)~. $$
For a martingal $X_n$ and $\tfrac1p+\tfrac1q=1$ we have: $$ \Big(\E\sup_{k\leq n}|X_k|^p\Big)^{1/p} \leq q(\E|X_n|^p)^{1/p}~. $$
$Y_n\colon=|X_n|$ is a non-negative submartingale and by Doob's inequality, Fubini and Hölder's inequality: \begin{eqnarray*} \E Y_n^{*p} &=&\int_0^\infty pr^{p-1}\P(Y_n^* > r)\,dr \leq p\int_0^\infty r^{p-2}\int_{[Y_n^* > r]}Y_n\,d\P dr\\ &=&p\E\int_0^{Y_n^*}Y_n\,r^{p-2}\,dr =\frac p{p-1}\E(Y_n^{*(p-1)}Y_n) \leq\frac p{p-1}(\E Y_n^{*(p-1)q})^{1/q}(\E Y_n^p)^{1/p} \end{eqnarray*} and since $q=p/(p-1)$, the conclusion follows. $\eofproof$
A subset $A$ of $L_1(\P)$ is said to be uniformly integrable if $$ \lim_{r\to\infty}\sup_{X\in A}\E(|X|;|X| > r)=0~. $$
Of course, any finite set of integrable random variables is uniformly integrable. Uniformly integrable subsets of $L_1(\P)$ are bounded; in fact they are just the weakly relatively compact subsets of $L_1(\P)$, cf. Dunford-Pettis Theorem. If $X_n$ is a bounded sequence in $L_p(\P)$ for some $p > 1$, then by Hölder's and Chebyshev's inequality: $$ \E(|X_n|;|X_n| > r) \leq\norm{X_n}_p\P(|X_n| > r)^{1/q} \leq\norm{X_n}_p r^{-p/q}(\E|X_n|^p){1/q} =r^{-1}\norm{X_n}_p^{1+p/q} $$ Hence bounded subsets in $L_p(\P)$, $p > 1$, are uniformly integrable. More generally, we have
Let $\vp:\R_0^+\rar\R_0^+$ be a function such that $\lim_{\to\infty}\vp(r)/r=\infty$. Suppose $\sup\{\E\vp(|X|):X\in A\}\leq C$, then $A$ is uniformly integrable. Suggested solution.
A subset $A$ in $L_1$ is uniformly integrable if and only if $$ \sup_{X\in A}\E|X| < \infty \quad\mbox{and} \lim_{\P(A)\to0}\sup_{X\in A}\E(|X|;A)=0~. $$
Suppose a sequence of random variables $X_n$ converges in probability to $X$, then the following assertions are equivalent:
  1. $X_n$ is uniformly integrable.
  2. $X_n$ converges in $L_1(\P)$.
  3. $\E|X_n|$ converges to $\E|X|$.
$\proof$ 1. $\Rar$ 2.: As uniformly integrable sets are bounded, we conclude that: $\sup_n\E|X_n| < \infty$. Hence by Fatou's lemma: $\E|X| < \infty$. For $a > 0$ put $f_a(x)=x$ if $|x|\leq a$ and $0$ otherwise. It follows that \begin{eqnarray*} \E|X_n-X| &\leq&\E|f_a(X_n)-f_a(X)|+\E|X_n-f_a(X_n)|+\E|X-f_a(X)|\\ &\leq&\E|f_a(X_n)-f_a(X)|+\sup_n\E(|X_n|;|X_n| > a)+\E(|X|;|X| > a)~. \end{eqnarray*} The first term converges to $0$ by bounded convergence for all $a > 0$; the second and the third term can be made arbitrarily small by choosing $a$ sufficiently large.
2. $\Rar$ 3.: $|\E|X_n|-\E|X||\leq\E|X_n-X|$.
3. $\Rar$ 1.: By bounded convergence we have: $\lim_n\E f_a(|X_n|)=\E f_a(|X|)$ and thus for $\e > 0$ and all $n\geq n(\e)$: $$ \E(|X_n|;|X_n| > a) =\E|X_n|-\E f_a(|X_n|) \leq\E|X|-\E f_a(|X|)+2\e~. $$ For sufficiently large $a$ the right hand side becomes smaller than $3\e$. Since finite sets are uniformly integrable we conclude that $\{X_n:n\in\N\}$ is uniformly integrable. $\eofproof$
Suppose $X_n$ is a bounded submartingale in $L_1(\P)$. Then $X_n$ converges in $L_1(\P)$ if and only if $X_n$ is uniformly integrable and this holds if and only if $\E|X_n|$ converges to $\E|\lim_n X_n|$.
Deduce the dominated convergence theorem from proposition.
Thus a uniformly integrable submartingale $(X_n,\F_n)$ converges in $L_1(\P)$ to some integrable random variable $X$. If in addition $X_n$ is a uniformly integrable martingale, then $X_n=\E(X|\F_n)$.
Let $X_0,X_1,\ldots$ be a sequence of independent random variables such that $\E X_n=0$ and put $S_n\colon=X_0+\cdots+X_n$. If $S_n\to S$ a.s., then $S_n$ is uniformly integrable. Hence $S_n$ converges in $L_1(\P)$ to $S$. Suggested solution.

Reverse martingales

Let $\F_n$ ba a decreasing sequence of $\s$-algebras. $(X_n,\F_n)$ is said to be a reverse martingal if for all $n\in\N$: $X_{n+1}=\E(X_n|\F_{n+1})$. $(X_n,\F_n)$ is said to be a reverse submartingal, if for all $n\in\N$: $X_{n+1}\leq\E(X_n|\F_{n+1})$.
Putting $Y_{-n}\colon=X_n$ and $\F_n^\prime=\F_{-n}$ we see that $(X_n,\F_n)$, $n\geq0$ is a reverse submartingal iff $(Y_n,\F_n^\prime)$, $n\leq0$ is a submartingale. Probably the most prominent example of a reverse martingale is the sequence of arithmetic means $$ M_n\colon=\frac1nS_n\colon=\frac1n(X_1+\cdots+X_n) $$ of i.i.d. random variables $X_1,X_2,\ldots$ with respect to $\F_n\colon=\s(S_n,X_{n+1},\ldots)$: indeed by symmetry we have for all $j,k=1,\ldots,n+1$: $\E(X_j|\F_{n+1})=\E(X_k|\F_{n+1})$ and since $\E(S_{n+1}|\F_{n+1})=S_{n+1}$ we conclude that $\E(X_j|\F_{n+1})=M_{n+1}$. It follows that $$ \E(M_n|\F_{n+1})=\frac1n\sum_{j=1}^n\E(X_j|\F_{n+1})=M_{n+1}~. $$
Let $(X_n,\F_n)$ be a reverse submartingal such that $\lim\E X_n$ exists. Then $X_n$ is uniformly integrable and converges both $\P$-a.s. and in $L_1(\P)$ to $X$. If $X_n$ is a reverse martingal, then: $X=\E(X_0|\bigcap\F_n)$.
$\proof$ The sequence $\E X_n$ decreases to some limit $\a\in\R$. For $\e > 0$ we can find an index $m$. such that for all $n\geq m$ gilt: $\E X_m\geq\E X_n > \E X_m-\e$. For any $r > 0$ we have: \begin{eqnarray*} \E(|X_n|;|X_n| > r) &=&\E(X_n;X_n > r)+\E(-X_n;X_n < -r)\\ &=&\E(X_n;X_n > r)-\E X_n+\E(X_n;X_n\geq-r)\\ &\leq&\E(X_m;X_n > r)-\E X_m+\e+\E(X_m;X_n\geq-r)\\ &=&\E(X_m;X_n > r)+\e+\E(-X_m;X_n < -r) =\E(|X_m|;|X_n| > r)+\e \end{eqnarray*} Finally $$ \E|X_n| =\E X_n^+-\E X_n^- =2\E X_n^+-\E X_n \leq-\a+2\E X_1^+ =\colon\b $$ and thus $\P(|X_n| > r)\leq t^{-1}\b$, i.e. $X_n$ is uniformly integrable.
Put $Y_{-j}\colon=X_j$, then we infer from the
Upcrossing Lemma applied to the submartingal $Y_{-n},\ldots,Y_{0}$: $$ U_n^{a,b}(Y)\leq\frac{\E(X_0-a)^+}{b-a}~. $$ This shows that $Y_{-n}$ converges both $\P$-a.s. and in $L_1(\P)$ to $X$. Since $X$ is measurable with respect to all $\s$-algebren $\F_n$, it's measurable with respect to $\bigcap\F_n$.
If $X_n$ is a reverse martingale, then for all $A\in\bigcap\F_n$: $$ \E(X_0;A)=\lim_n\E(X_n;A)=\E(X;A), $$ i.e. $X=\E(X_0|\bigcap\F_n)$. $\eofproof$
If $(X_n,\F_n)$ is a reverse martingale and $T=\sup\{n:A_n > \e\}$, then $[T=n]\in\F_n$.

Optional stopping

Let $\F_n$ be a filtration. A non negative random variable $T:\O\rar\N_0$ is said to be a stopping time with respect to $\F_n$, if $$ \forall n\in\N_0:\quad [T\leq n]\in\F_n~. $$ In this case we put: $$ \F_T\colon=\{A\in\F:\,\forall\ n\in\N_0:\ A\cap[T\leq n]\in\F_n\}~. $$
$\F_T$ is a $\s$-algebra and every stopping time $T$ is $\F_T$-measurable. Moreover, if $T\leq n$, then $\F_T\sbe\F_n$.
If $T$ is an $\F_n$-stopping time then for all $n\in\N_0$: $[T=m]\cap\F_T=[T=m]\cap\F_m$.
Let $X_n:\O\rar S$ be a sequence of $\F_n$-adapted random variables (i.e. for all $n$ $X_n$ is $\F_n$-measurable) and $A\in\B(S)$. Then $T\colon=\inf\{n\geq0:\,X_n\in A\}$ is a stopping time, for $$ [T\leq n]=\bigcup_{j=0}^n[X_n\in A]\in\F_n~. $$ Moreover the random variable $X_T:\O\rar S$ is $\F_T$-measurable, because for all $B\in\B(S)$: $$ [X_T\in B]\cap[T\leq n]=\bigcup_{j=0}^n[X_j\in B,T=j]\in\F_n $$
Let $S$, $T$, $T_n$ be stopping times. Then $S\wedge T$, $S\vee T$, $S+T$, $\sup T_n$, $\inf T_n$, $\limsup T_n$ and $\liminf T_n$ are stopping times.
For $n\geq0$ we have: $[S+T=n]=\bigcup_{j=0}^n[S=j,T=n-j]$.
Suppose $X_n$ is a uniformly integrable submartingal and $S,T$ stopping times such that $S\leq T$. Then a.s.: $X_S\leq\E(X_T|\F_S)$. 2. If $X_n$ is a uniformly integrable martingal, then a.s.: $\E(X_T|\F_S)=X_S$.
Suppose $(X_n,\F_n,\P^x)$ is a Markov chain in the Polish space $S$ and $T$ a (bounded) stopping times with respect to $\F_n$. For any bounded measurable $f:S\rar\R$ we have by the Markov property and the fact that $[T=m]\cap\F_T=[T=m]\cap\F_m$ (cf. exam): \begin{eqnarray} \E^x(f(X_{T+n})|\F_T) &=&\sum_m\E^x(f(X_{m+n})|\F_T)I_{[T=m]}\\ &=&\sum_m\E^x(f(X_{m+n})|\F_m)I_{[T=m]} =\sum_mP^nf(X_m)I_{[T=m]} =P^nf(X_T) =\E^{X_T}f(X_n)~. \end{eqnarray} This is called the strict Markov property. There is also an extended version of this property: define $\Theta_T$ by $X_n\circ\Theta_T=X_{T+n}$ for all $n\in\N_0$, then for any $\F_\infty^X$-measurable and bounded function $F:\O\rar\R$: $$ \E^x(F\circ\Theta_T|\F_T)=\E^{X_T}F $$ where the right hand side is $f(X_T)$ for $f(x)\colon=\E^xF$.

Maximal Functions of Reversible Markov Chains

Rota's Theorem

Now let's assume $(X_n,\F_n,\P^x)$ is a reversible Markov chain with repsect to the probability measure $\mu$ on $S$. By $\F_n^\prime$ we will denote the $\s$-algebra generated by the $n$-fold shift operator $\Theta_n$ - in case $\F=\F_\infty^X$ $\F_n^\prime$ is simply the $\s$-algebra generated by $X_n,X_{n+1},\ldots$.
For all bounded measurable functions $f:S\rar\R$ and all $n,m\in\N_0$ we have: $$ \E^\mu(f(X_m)|\F_{m+n}^\prime)=P^nf(X_{m+n})~. $$
$\proof$ We will show by induction on $k$, that for all bounded measurable functions $g_0,\ldots,g_k:S\rar\R$: $$ \E^\mu(f(X_m)g_0(X_{m+n})\ldots g_k(X_{m+k+n})) =\E^\mu(P^nf(X_{m+n})g_0(X_{m+n})\ldots g_k(X_{m+k+n})) $$ By the Monotone Class Theorem this implies that for all bounded $\F_{m+n}^\prime$-measurable functions $F$: $$ \E^\mu(f(X_m)F)=\E^\mu(P^nf(X_{m+n})F) $$ The above equality holds for $k=0$: On the one hand we have by the Markov property \begin{eqnarray*} \E^\mu(f(X_m)g(X_{m+n})) &=&\E^\mu(f(X_m)\E^\mu(g(X_{m+n})|\F_m))\\ &=&\E^\mu(f(X_m)P^ng(X_m)) =\int fP^ng\,d\mu~. \end{eqnarray*} On the other hand we get by $\mu$-invariance: $$ \E^\mu(P^nf(X_{m+n})g(X_{m+n})) =\int P^nf.g\,d\mu~. $$ Hence both sides coincide by symmetry of $P^n$.
Now by induction hypothesis we get putting $G\colon=g_0(X_{m+n})\ldots g_k(X_{m+k+n})$: \begin{eqnarray*} \E^\mu(f(X_m)Gg_{k+1}(X_{m+k+1+n})) &=&\E^\mu(f(X_m)G\E^\mu(g_{k+1}(X_{m+k+1+n})|\F_{m+k+n}))\\ &=&\E^\mu(f(X_m)GPg_{k+1}(X_{m+k+n}))\\ &=&\E^\mu(P^nf(X_{m+n})GPg_{k+1}(X_{m+k+n}))\\ &=&\E^\mu(P^nf(X_{m+n})G\E^\mu(g_{k+1}(X_{m+k+1+n})|\F_{m+k+n}))\\ &=&\E^\mu(P^nf(X_{m+n})Gg_{k+1}(X_{m+k+1+n}))\\~. \end{eqnarray*} $\eofproof$
The following example reflects the symmetry:
For all $m\leq n\in\N_0$ and all bounded measurable functions $f:S\rar\R$: $$ \E^\mu(f(X_{n-m})|\F_n^\prime) =P^mf(X_n) =\E^\mu(f(X_{n+m})|\F_n) =\E^{X_n}f(X_m)~. $$
Let $G:(\O,\F_n)\rar\R$ be bounded, then $\E(G|\F_n^\prime)$ is $\s(X_n)$-measurable and we have for all $n$: $$ \E^\mu\Big(f_n(X_n)f_{n-1}(X_{n-1})\cdots f_0(X_0)\Big|\F_n^\prime\Big) =\E^{X_n}(f_n(X_0)\cdots f_0(X_n))~. $$
$\proof$ By exam we have (for $k=0,\ldots,n$): $$ \E^\mu(f_k(X_k)|\F_n^\prime)=P^{n-k}f_k(X_n)=\E^{X_n}f_k(X_{n-k})~. $$ This proves the assertion for one factor, in particular $$ \E^\mu(f_0(X_0)|\F_1^\prime)=Pf_0(X_1)~. $$ On the other hand we get by the Markov property for all $x$: $Pf_0(X_{n-1})=\E^x(f_0(X_n)|\F_{n-1})$ and therefore: $$ Pf_0(X_{n-1})=\E^{X_n}(f_0(X_n)|\F_{n-1})~. $$ Now put for $k=1,\ldots,n$: $G=f_k(X_k)\cdots f_0(X_0)$, then we conclude by induction on the number of factors: \begin{eqnarray*} \E^\mu(G|\F_n^\prime) &=&\E^\mu\Big(f_k(X_k)\cdots f_1(X_1)\E^\mu(f_0(X_0)|\F_1^\prime)\Big|\F_n^\prime\Big)\\ &=&\E^\mu\Big(f_k(X_k)\cdots f_1(X_1)Pf_0(X_1)\Big|\F_n^\prime\Big)\\ &=&\E^{X_n}(f_k(X_{n-k})\cdots f_1(X_{n-1})Pf_0(X_{n-1}))\\ &=&\E^{X_n}(f_k(X_{n-k})\cdots f_1(X_{n-1})\E^{X_n}(f_0(X_n)|\F_{n-1}))\\ &=&\E^{X_n}(f_k(X_{n-k})\cdots f_1(X_{n-1})f_0(X_n))~. \end{eqnarray*} By the Monotone Class Theorem we infer that for all bounded $G:(\O,\F_n)\rar\R$ the conditional expectation $\E(G|\F_n^\prime)$ is $\s(X_n)$-measurable. $\eofproof$

Maximal Theorem for reversible Markov chains

By Rota's Theorem the sequence $(P^nf(X_{m+n}),\F_{m+n}^\prime)_{n\in\N_0}$ is for all $m\in\N_0$ and all $f\in L_1(\mu)$ a reverse martingale: indeed, $P^nf(X_{m+n})=\E^\mu(f(X_m)|\F_{n+m}^\prime)$ and $\F_n^\prime$ is a decreasing sequence of $\s$-algebras.
For $f\in L_p(\mu)$ the function $\sup_n|P^nf|$ is called the maximal function of $f$ with respect to a reversible Markov chain with Markov operator $P$.
Suppose $P$ is the Markov operator of a reversible Markov chain with respect to a probability measure $\mu$ on $S$. Then for all $f\in L_p(\mu)$ and $\tfrac1p+\tfrac1q=1$: $$ \Big(\int\sup_n|P^nf|^p\,d\mu\Big)^{1/p} \leq2q\Big(\int|f|^p\,d\mu\Big)^{1/p}~. $$
$\proof$ $M_n\colon=P^nf(X_n)=\E^\mu(f(X_0)|\F_{n}^\prime)$ is by definition a reverse martingal with respect to $\F_n^\prime$ and by the Markov property we have for all $m$: $$ \E^\mu(M_m|\F_0) =\E^\mu(P^mf(X_m)|\F_0) =P^{2m}f(X_0)~. $$ Now for every increasing convex function $\vp:\R_0^+\rar\R_0^+$ and all $m$: \begin{eqnarray*} \E^\mu\Big(\vp(\sup_n|M_n|)\Big|\F_0\Big) &\geq&\E^\mu(\vp(|M_m|)|\F_0)\\ &\geq&\vp(|\E^\mu(M_m|\F_0)|) =\vp(|P^{2m}f(X_0)|)~. \end{eqnarray*} Hence $$ \E^\mu\Big(\vp(\sup_n|M_n|)\Big|\F_0\Big) \geq\sup_n\vp(|P^{2n}f(X_0)|) $$ and since the distribution of $X_0$ under $\P^\mu$ is $\mu$ we get for $\vp(x)=x^p$ by Doob's inequality: \begin{eqnarray*} \norm{\sup_n|P^{2n}f|}_{L_p(\mu)} &=&\Big(\E^\mu\sup_n|P^{2n}f(X_0)|^p\Big)^{1/p}\\ &\leq&\Big(\E^\mu\sup_n|M_n|^p\Big)^{1/p} \leq q(\E^\mu|M_0|^p)^{1/p} =q\norm f_{L_p(\mu)} \end{eqnarray*} Applying this inequality to $Pf$ yields: $$ \norm{\sup_n|P^{2n+1}f|}_{L_p(\mu)} \leq q\norm{Pf}_{L_p(\mu)} =q\norm{f}_{L_p(\mu)}~. $$ $\eofproof$

Central Limit Theorem for Martingales

Quadratic variation

For an i.i.d. sequence $X_n$ in $L_2$ satisfying $\E X_n=0$ and $\E X^2=\s^2$ the sequence $M_n\colon=\sum_{j\leq n}X_n$ is a martingale satisfying $\E M_n=0$ and $\E M_n^2=n\s^2$. The Central Limit Theorem asserts that $M_n/\sqrt n$ converges weakly to a random variable with distribution $N(0,\s^2)$. We are going the show a martingale version of this classical result. For any martingale $(M_n,\F_n)$ in $L_2$ denote by $\D M_n\colon=M_n-M_{n-1}$ the difference sequence; put $$ \D A_j\colon=\E((\D M_j)^2|\F_{j-1}) \quad\mbox{and let}\quad A_n\colon=\sum_{j=1}^n\D A_j,\quad A_0\colon=0 $$ be its (predictable) quadratic variation
.
For any martingale $(M_n,\F_n)$ in $L_2$ the (predictable) quadratic variation $A_n$ is the unique sequence such that $A_0=0$, $A_n$ is $\F_{n-1}$-measurable and $M_n^2-A_n$ is a martingale (with respect to the sequence $\F_n$).
Let $X_n$ be an i.i.d. sequence in $L_2$ satisfying $\E X_n=0$ and $\E X^2=\s^2$. Prove that the quadratic variation of the martingale $M_n\colon=\sum_{j\leq n}X_n$ is given by $A_n=n\s^2$.
Let $(M_n,\F_n)$ be a martingale in $L_2$ such that for some constant $C$ and all $n\in\N$: $\E(|\D M_n|^3|\F_{n-1})\leq C^3$. If $A_n/n$ converges in measure to $\s^2\in\R^+$, then $M_n/\sqrt n$ converges weakly to a centered gaussian with variance $\s^2$.
$\proof$ We are going to employ Theorem. W.l.o.g assume $M_0=0$ and $C=1$. For $t,x\in\R$ and $y\in[0,1]$ we have by Taylor's theorem: $$ e^{itx+t^2y/2}=1+itx+\tfrac12t^2y-\tfrac12t^2x^2+R(t,x,y)\\ $$ where $|R(t,x,y)|\leq c(1+|x|^3)|t|^3e^{t^2}$. Put $$ Z_n\colon=\exp(itM_n+t^2A_n/2), $$ then: $Z_{n+1}=Z_nY_{n+1}$, where $$ Y_{n+1}=\exp(it\D M_{n+1}+t^2\D A_{n+1}/2)~. $$ We are going to verify that $Z_n$ is almost an $\F_n$-martingale, i.e. $\E(Y_{n+1}|\F_n)$ is close to $1$. By Jensen's inequality we have $$ \D A_{n+1}^{1/2} =\E(|\D M_{n+1}|^2|\F_n)^{1/2} \leq\E(|\D M_{n+1}|^3|\F_n)^{1/3}, $$ and thus: $0\leq\D A_{n+1}\leq1$. Since $\E(\D M_{n+1}|\F_n)=0$ and $\D A_{n+1}-\E((\D M_{n+1})^2=0$, we conclude that \begin{eqnarray*} |\E(Y_{n+1}|\F_n)-1| &=&|\E(R(t,\D M_{n+1},\D A_{n+1})|\F_n)|\\ &\leq&c(1+\E(|\D M_{n+1}|^3|\F_n))|t|^3e^{t^2} \leq 2ct^3e^{t^2}~. \end{eqnarray*} Therefore we can find a random variable $\e$ such that $|\e|\leq1$ and $\E(Y_{n+1}|\F_n)=1+2c\e t^3e^{t^2}$. This implies that \begin{eqnarray*} |\E Z_{n+1}-1| &=&|\E(Z_n\E(Y_{n+1}|\F_n)-1)|\\ &=&|\E(Z_n(1+2c\e|t|^3e^{t^2})-1)|\\ &\leq&|\E(Z_n-1)|(1+2c|t|^3e^{t^2})+2c|t|^3e^{t^2}~. \end{eqnarray*} Since $\E Z_0-1=0$ we infer that: $|\E(Z_n-1)|\leq(1+2c|t|^3e^{t^2})^n-1$. Finally we have $$ \lim_n(1+2c(t/\sqrt n)^3e^{t^2/n})^n=1, $$ and thus by e.g. bounded convergence: $$ 1=\lim_n\E e^{itM_n/\sqrt n+\frac12t^2A_n/n} =e^{t^2\s^2/2}(\lim_n\E e^{itM_n/\sqrt n})~. $$ On the other hand $t\mapsto e^{-t^2\s^2/2}$ is the characteristic function of a centered gaussian with varaince $\s^2$. By Theorem the sequence $M_n/\sqrt n$ converges weakly to a centered gaussian with varaince $\s^2$. $\eofproof$

Application to Markov chains

We will finish this section with an application to Markov chains: Let $(X_n,\F_n)$ be a Markov chain in $S$ with Markov operator $P$. Suppose $f:S\rar\R$ measurable and bounded and put $g\colon=Lf\colon=Pf-f$, then by Birkhoff's Ergodic Theorem $$ \frac1{n+1}\sum_{j=0}^n g(X_j) $$ converges $\P^\mu$-a.e. to $0$. By exam $$ M_n\colon=f(X_n)-\sum_{j=0}^{n-1}g(X_j) $$ is an $\F_n$-martingal. Let's compute its quadratic variation: As $g+f=Pf$ we have \begin{eqnarray*} \E^\mu((\D M_n)^2|\F_{n-1}) &=&\E^\mu((f(X_n)-g(X_{n-1})-f(X_{n-1}))^2|\F_{n-1})\\ &=&\E^\mu((f(X_n)-Pf(X_{n-1}))^2|\F_{n-1})\\ &=&\E^\mu\Big(f(X_n)^2-2f(X_n)(Pf(X_{n-1}))\Big|\F_{n-1}\Big) +(Pf)^2(X_{n-1})\\ &=&(Pf^2-2(Pf)^2+(Pf)^2)(X_{n-1}) =(Pf^2-(Pf)^2)(X_{n-1})~. \end{eqnarray*} Of course we could have replaced $\E^\mu$ with $\E^x$ for any $x\in S$! Therefore the quadratic variation of $M_n$ is given by $$ A_n=\sum_{j=0}^{n-1}(Pf^2-(Pf)^2)(X_j)~. $$ Assuming ergodicity of the chain we infer from Birkhoff's Ergodic Theorem: $A_n/n$ converges $\P^\mu$ a.e. to $\int P(f^2)-(Pf)^2\,d\mu$. Thus we've proved the following corollary:
Suppose $X_n$ is an ergodic Markov chain with Markov operator $P$ and invariant probability measure $\mu$. If $f$ is bounded and $g=(P-1)f$, then $$ \frac1{\sqrt n}\sum_{j=0}^{n-1}g(X_j) $$ converges weakly to a centered gaussian with variance $\s^2\colon=\int f^2-(Pf)^2\,d\mu$.
For self-adjoint ergodic Markov operators the corresponding Markov chain is ergodic (by e.g. subsection) and we have $(\im L)^\perp=\ker L$, which is just the one-dimensional space of constant functions. Thus the assertion of the corollary holds for essentially all bounded $g$ satisfying $\int g\,d\mu=0$.
If $P_t=e^{tL}:L_2(\mu)\rar L_2(\mu)$ is ergodic, then for all $f\in L_2(\mu)$, $\int f\,d\mu=0$: $\lim_{\l\dar0}LU_\l f=-f$.
Under the hypotheses of theorem we have for bounded $f$: $\s^2\geq\norm f_2^2(1-(1-K/4)^2)$.
Suppose $X_n$ is a random walk on an undirected graph $(V,E)$, then by e.g. exam: $\mu(x)=d(x)/2|E|$. For $\int f^2\,d\mu=1$ we get $$ \s^2 =1-\frac1{2|E|}\sum_x\frac1{d(x)}\Big(\sum_{y:y\sim x}f(y)\Big)^2~. $$
Suppose $(V,E)$ is $d$-regular and $\sum_x f(x)^2\mu(x)=1$, then $$ \s^2 =1-\frac1{2d|E|}\sum_x\Big(\sum_{y:y\sim x}f(y)\Big)^2~. $$
Corollary essentially holds for reversible, ergodic Feller semigroups $P_t$ with generator $L$ as well: $$ \frac1{\sqrt t}\int_0^t Lf(X_s)\,ds $$ converges weakly to a centered gaussian with variance $$ \s^2\colon=-2\int fLf\,d\mu~. $$

Weak Compactness

The weak topology on a Banach space $E$ is the coarsest topology on $E$ such that all $x^*\in E^*$ are continuous. Thus a subset $U$ of $E$ is open in the weak topology iff for all $x\in U$ there is a finite set $x_1^*,\ldots,x_n^*\in E^*$ such that $$ \bigcap_j\{y:\,|x_j^*(y-x)| < 1\}\sbe U $$
A sequence $x_n$ in $E$ converges in the weak topology to $0$ iff for all $x^*\in E^*$: $x^*(x_n)\to0$.
In topology a compact space $X$ is a Hausdorff space such that every open cover of $X$ has a finite subcover. A subset $A$ of a topological space is called relatively compact, if its closure is compact. A subset $A$ of a Hausdorff space $X$ is said to be sequentially compact, if every sequence in $A$ has a convergent subsequence (with limit in $X$). In general neither is a compact space sequentially compact nor is a sequentially compact space compact - of course, both concepts are equivalent in metric spaces. However in case of the weak topology we have:
A subset $A$ of a Banach space $E$ is weakly relatively compact if and only if it is sequentially compact. A closed convex set $C$ in $E$ is weakly compact if and only if every sequence in $C$ has a weakly convergent subsequence.
The assertion is more or less obvious in case $E$ is separable, for every weakly compact subset of a separable Banach space is metrizable: just take a sequence $x_n^*\in E^*$ such that $\norm{x_n^*}=1$ and $\Vert x\Vert=\sup|x_n^*(x)|$; then the metric $$ d(x,y)\colon=\sum_n2^{-n}|x_n^*(x-y)| $$ will do it! Another useful general criterion, which we implicitely employed before:
Let $B_E$ be the open unit ball of a Banach space $E$ and $A\sbe E$ a bounded subset. $A$ is weakly relatively compact if and only if for all $r > 0$ there is a weakly relatively compact subse $K_r$ such that $A\sbe K_r+rB_E$.
For every weakly relatively compact subset $A$ of a Banach space $E$ the convex hull $\convex{A}$ is weakly relatively compact.
Finally a subset $A$ of a metric space $X$ is said to be pre-compact (or totally bounded) if for all $r > 0$ there is a finite set $D$ such that $A\sbe D_r$; in other words: for all $r > 0$ the set $A$ has a finite $r$-net.
In a complete metric space a subset is relatively compact if and only if it is pre-compact.

Weak compactness in $L_1(\mu)$

If $\mu$ is a probability measure then a subset $A$ of $L_1(\mu)$ is weakly relatively compact if and only if it is uniformly integrable.
$\proof$ The sufficiency follows from proposition and the fact that bounded subsets of $L_\infty(\mu)$ are weakly relatively compact in $L_1(\mu)$ - this follows from the Banach-Alaoglu Theorem and the fact that the weak * topology on $L_\infty(\mu)$ is stronger than the weak topology inherited from $L_1(\mu)$. The necessity is a bit more intricate. $\eofproof$
If $\mu$ is not necessarily finite, then uniform integrability is not sufficent for weak compactness, we also need that for all $\e > 0$ there is $F\in\F$ such that $\mu(F) < \infty$ and $$ \sup_{f\in A}\int_{F^c}|f|\,d\mu < \e, $$ so all functions in $A$ are essentially supported on a set of finite measure.

Weak compactness and compactness in $C(S)$

For any metric or topological space $S$ the vector space $C(S)$ of continuous functions $f:S\rar\R$ can be considered as a subspace of the topological space $\R^S$. $C^s(S)$ will denote the space $C(S)$ endowed with the topology inherited from $\R^S$. $C(S)$ endowed with the norm $\norm f\colon=\sup\{|f(x)|:x\in S\}$ is a Banach space.
If $S$ is a compact space then a subset $A$ of the Banach space $C(S)$ is weakly relatively compact if and only if it is bounded and relatively compact in $C^s(S)$.
If $A$ is a bounded subset of $C(S)$, then $A$ is relatively compact in $\R^S$, but that doesn't necessarily mean that it's relatively compact in $C^s(S)$: $A$ may have non-continuous cluster points! Criteria for relative compactness in $C^s(S)$ usually involve equicontinuity, but in that case we actually get more:
If $S$ is a compact metric space then a subset $A$ of $C(S)$ is relatively compact (in the norm topology) if and only if $A$ is bounded and equicontinuous, i.e. for all $x\in S$, all $\e > 0$ there is some $r > 0$ such that $$ \sup\{|f(y)-f(x)|:\,f\in A,y\in B_r(x)\}\leq\e~. $$
Cf. e.g. wikipedia. The quantity $\o_f(x,r)\colon=\sup\{|f(y)-f(x)|:y\in B_r(x)\}$ is called the modulus of continuity of $f$ at $x$; so the main condition in the Arzelà-Ascoli Theorem is the continuity of $r\mapsto\sup\{\o_f(x,r):f\in A\}$ at $r=0$. A typical example is
If $A$ is a bounded subset of $C(S)$ such that for all $x\in S$ there are $\a(x) > 0$ and $C(x) < \infty$ such that $$ \forall f\in A\,\forall y\in S:\quad|f(x)-f(y)|\leq C(x)d(x,y)^{\a(x)}~. $$ Then $A$ is equicontinuous.

Weak * Compactness

The weak * topology on the dual $E^*$ of a Banach space $E$ is the coarsest topology on $E^*$ such that all evaluation maps $x^*\mapsto x^*(x)$, $x\in E$ are continuous. A subset $U$ of $E^*$ is open in the weak * topology iff for all $x^*\in U$ there is a finite set $x_1,\ldots,x_n\in E$ such that $$ \bigcap_j\{y^*:\,|y^*(x_j)-x^*(x_j)| < 1\}\sbe U~. $$
A sequence $x_n^*$ in $E^*$ converges in the weak * topology to $0$ iff for all $x\in E$: $x_n^*(x)\to0$.
The closed unit ball of the dual $E^*$ of a Banach space $E$, i.e. the set $$ \cl{B_{E^*}}\colon=\Big\{x^*\in E^*:\sup_{\Vert x\Vert\leq1}|x^*(x)|\leq1\Big\} $$ is weakly * compact. In particular a norm bounded subset $A$ of $E^*$ is relatively weakly * compact in $E^*$ and its weak * closure is norm bounded.
That doesn't mean that a norm bounded subset $A$ is sequentially weakly * compact. However if $E$ is separable, then the weak * topology on any bounded subset $A$ of $E^*$ is metrizable and thus $A$ is sequentially weakly * compact: just choose a dense subset $x_n\in B_E$ and put $$ d(x^*,y^*)\colon=\sum_n2^{-n}|x^*(x_n)-y^*(x_n)|~. $$ Hence in this case every sequence $x_n^*$ in $A$ has a subsequence converging in the weak * topology to some $x^*\in E^*$.

The space $M_1(S)$

For any Polish space $S$ the set $M_1(S)$ of Borel probability measures on $S$ can be considered as a subset of $C_b(S)^*$: just identify $\mu$ with the linear functional $$ f\mapsto\int f\,d\mu~. $$ The norm of this functional is obviously $1$. By the
Banach-Alaoglu Theorem the set $M_1(S)$ is relatively weakly * compact in $C_b(S)^*$. However a point in the weak * closure of $M_1(S)$ need not be a probability measure on $S$: take for example the sequence of Dirac measures $\d_n$ in $M_1(\R)$: if $\nu$ is any accumulation point of this sequence in $M_1(\R)$, then for all compact sets $K\sbe\R$: $\nu(K)=0$, i.e. $\nu(\R)=0$. This simply shows that $M_1(S)$ is in general not a closed subset of $C_b(S)^*$. Of course, if $S$ is compact, then $M_1(S)$ is closed in $C(X)^*$ and therefore it's compact in the weak * topology.
The set $L\colon=\{(x_n)\in\ell_1=c_0^*:0\leq x_n,\,\sum x_n=1\}$ is closed and convex (and thus weakly closed) but it's not weakly * closed.
Let $\b S$ be the Stone-Čech compactification of the Polish space $S$ (cf. e.g. wikipedia). Then $C_b(S)$ is isometrically isomorphic to $C(\b S)$. By the Riesz Representation Theorem its dual can be identified with the space of all finite (signed, complex) Borel measures on $\b S$.
As a Polish space $S$ is Čech-complete, it's a $G_\d$-subset of $\b S$. Hence a probability measure $\mu$ on $\b S$ is in $M_1(S)$ iff $\mu(S)=1$, i.e. $\mu(\b S\sm S)=0$. Moreover $S$ is a $G_\d$-subset of any compactification $cS$. From topology we know that there is a compact Polish space $cS$ containing $S$ as a dense subspace. Thus the probability measures $M_1(S)$ on $S$ can also be regarded as a subset of the probability measures $M_1(cS)$ - just put $\mu(cS\sm S)=0$. Furnishing $M(cS)$ with the weak * topology, the set $M_1(cS)$ becomes a metrizable compact subset. The topology on $M_1(S)$ inherited from $M_1(cS)$ is by definition weaker than the topology inherited from $M_1(\b S)$. However a subset $\G$ of $M_1(S)$ is weakly * compact iff it is weakly * compact in the topology inherited from $C(cS)^*$. If in addition $S$ is locally compact, then we may choose $cS$ to be the one-point compactification $S^\o$ and thus me may consider $M_1(S)$ as a subset of $C_0(S)^*$. Anyhow, a subset $\G$ of $M_1(S)$ is weakly * compact iff it is weakly * compact in the topology inherited from $C_0(S)^*$, cf. theorem and exam.
Suppose $S$ is locally compact (and Polish) and let $S^\o$ be the one-point compactification of $S$. Then $C_0(S)$ is a closed subspace of $C(S^\o)$ of co-dimension $1$.
Suppose $S$ is locally compact (and Polish) but not compact. Show that the set $M_1(S)$ is not weakly * closed in $C_0(S)^*$.
So what we are really looking for is a criterion for a subset $\G$ of $M_1(S)$ to be relatively weakly * compact in the space $M_1(S)$ with the weak * topology inherited from $C_b(S)^*$. Before doing this (in theorem) we give a description of the topological space $M_1(S)$, showing that it is not at all exotic.
The space $M_1(S)$ with the weak * topology inherited from $C_b(S)^*$ is a Polish space and a suitable metric is given by $$ d_L(\mu,\nu)\colon= \inf\{\d > 0:\,\forall A=\cl A:\ \mu(A)\leq\nu(A_\d)+\d, \nu(A)\leq\mu(A_\d)+\d\}~. $$ In fact $(M_1(S),d_L)$ is a Polish space and the metric $d_L$ is called the Lévy metric.
In the following we will always consider $M_1(S)$ with this topology! It's definitely a bit confusing that this topology on $M_1(S)$ is generally called the weak topology and the corresponding convergence the weak convergence of probability measures.
The function $d_H:M_1(S)\times M_1(S)\rar\R_0^+$, $$ d_H(\mu,\nu) \colon=\sup\Big\{\Big|\int f\,d\mu-\int f\,d\nu\Big|: \,\lip(f)\leq1, \norm f_\infty\leq1\Big\} $$ is another metric on $M_1(S)$. $d_H$ is called the Hutchinson metric. 2. $d_H$ is stronger than $d_L$, i.e. every sequence converging with respect to $d_H$ converges with respect to $d_L$. 3. If $S$ is compact, then both metrics are equivalent, i.e. a sequence converges with respect to $d_H$ if and only if it converges with respect to $d_L$.
$\proof$ 2. Let $A\sbe S$ be closed and $1 > r > 0$. Define $f:S\rar[0,1]$ by $f(x)=(1-r^{-1}d_A(x))^+$, then $\lip(f)=r^{-1}$, $f|A=1$ and $f|A_r^c=0$. Hence $$ r(\mu(A)-\nu(A_r)) \leq\int rf\,d\mu-\int rf\,d\nu \leq d_H(\mu,\nu)~. $$ This shows that $\mu(A)\leq\nu(A_r)+r^{-1}d_H(\mu,\nu)$ and by symmetry: $\nu(A)\leq\mu(A_r)+r^{-1}d_H(\mu,\nu)$. Now choose $r=d_H(\mu,\nu)^{1/2}$, then it follows that $$ d_L(\mu,\nu)\leq\sqrt{d_H(\mu,\nu)}~. $$ 3. The set $L\colon=\{f\in C(S):\lip(f)\leq1,\norm f\leq1\}$ is compact in $C(S)$ by the Arzelà-Ascoli Theorem. Hence by exam $d_H$ is continuous on $M_1(S)\times M_1(S)$. $\eofproof$
If $S$ is compact, then the function $d:M_1(S)\times M_1(S)\rar\R_0^+$, $$ d(\mu,\nu) \colon=\sup\Big\{\Big|\int f\,d\mu-\int f\,d\nu\Big|:\,\lip(f)\leq1\Big\} $$ is another metric on $M_1(S)$. Prove that $d_H\leq d\leq diam(S)d_H$. Hence $d$ is equivalent to the Hutchinson metric.

Compactness in $M_1(S)$

For any open subset $U$ of $S$ the set $C\colon=\{\mu\in M_1(S):\,\mu(U)\leq\d\}$ is closed in $M_1(S)$.
$\proof$ This simply follows from $$ C=\Big\{\mu\in M_1(S):\,\forall f\in C_b(S):\,0\leq f\leq I_U \quad\int f\,d\mu\leq\d\Big\}~. $$
Let $\G$ be a subset of $M_1(S)$. Then $\G$ is relatively compact in $M_1(S)$, if and only if for all $\e > 0$ there is a compact subset $K$ of $S$, such that for all $\mu\in\G$: $\mu(K^c) < \e$ - in this case the family $\G$ is said to be tight.
$\proof$ To prove sufficiency assume $f:S\rar[0,1]$ is continuous and $f|K=1$. Then for all $\mu\in\G$: $\int f\,d\mu\geq\mu(K) > 1-\e$ and thus for any point $x^*\in C_b(S)^*$ in the weak * closure of $\G$: $x^*(f)\geq1-\e$, $x^*$ is positive and $x^*(1)=1$. Hence there is a Borel probability measure $\nu$ on the Stone-Čech completion $\b S$ of $S$; as $S$ is Polish $S$ is a $G_\d$-set in $\b S$ (i.e. Polish spaces are Čech complete) and the above conditions imply that $\nu(\b S\sm S)=0$, i.e. $\nu\in M_1(S)$.
Necessity: As all $\mu\in\cl\G$ are regular we may find for every $\e > 0$ and every $m\in\N$ a compact subset $K_m(\mu)$ such that $\mu(K_m(\mu)) > 1-2^{-m}\e$. Put \begin{equation}\label{weseq1}\tag{WES1} \G_m(\mu)\colon=\{\mu\in\G:\,\mu(K_m(\mu)_{1/m}) > 1-2^{-m}\e\}, \end{equation} By
lemma this set is an open neighborhood of $\mu$ in $M_1(S)$. Since $\cl\G$ is compact there are $\mu_1,\ldots,\mu_{N(m)}$, such that: $$ \G\sbe\bigcup_{j=1}^{N(m)}\G_m(\mu_j)~. $$ The set $K\colon=\bigcap_m\bigcup_{j=1}^{N(m)}K_m(\mu_j)_{1/m}$ is pre-compact, for each set $\bigcup_{j=1}^{N(m)}K_m(\mu_j)_{1/m}$ has a finite $1/m$-net - and for all $\mu\in\G$ we have for some $k=k(m)$: $\mu\in\G_m(\mu_k)$ and thus by \eqref{weseq1}: $$ \mu(K^c) \leq\sum_m\mu\Big(\bigcap_{j=1}^{N(m)}K_m(\mu_j)_{1/m}^c\Big) \leq\sum_m\mu(K_m(\mu_k)_{1/m}^c) < \sum_m2^{-m}\e=\e~. $$ As $S$ is complete and $K$ is closed, $K$ must be compact by proposition. $\eofproof$
If $\G$ is relatively compact in $M_1(S)$, then $\convex{\G}$ is relatively compact in $M_1(S)$. By proposition this result holds for any subset $\G$ of a topological vector space $E$ if $\G$ is contained in a convex subset $M$, which is metrizable and complete.
If $S$ is locally compact (and Polish), then a subset $\G$ of $M_1(S)$ is relatively compact in $M_1(S)\sbe C_0(S)^*$, iff for all $\e > 0$ there is a compact subset $K$ of $S$, such that for all $\mu\in\G$: $\mu(K^c) < \e$, i.e. iff $\G$ is relatively compact in $M_1(S)\sbe C_b(S)^*$.
Let $S$ be Polish, $\mu\in M_1(S)$ and $K\in\R^+$. Use regularity of $\mu$ to show that the set of probability measures $\nu$ on $S$ such that for all Borel sets $B$: $\nu(B)\leq K\mu(B)$ is relatively compact in $M_1(S)$.
The existence of invariant probability measures mostly relies on the following result (cf. e.g. wikipedia):
Let $C$ be a compact and convex subset of a topological vector space and $A$ a commuting family of continuous linear (or affine) maps $u:C\rar C$. Then there is some $x\in C$ such that for all $u\in A$: $u(x)=x$.
$\proof$ 1. For a single $u\in A$ take an arbitrary $y\in C$, put $$ x_n\colon=\frac1n\sum_{j=0}^{n-1}u^j(y), $$ $F_n\colon=\{x_m:m\geq n\}$ and $v\colon=u-1$. Then (cf. proposition): $$ u(x_n)=x_n+\frac1n(u^{n}(y)-y), \quad\mbox{i.e.}\quad v(F_n)\sbe\bigcup\{t(C-C):0\leq t\leq1/n\}~. $$ Hence any ultrafilter generated by the filterbasis $\{F_n:n\in\N\}$ converges to a fixed point $x$ of $u$. Moreover the set of fixed points $C_u$ of $u$ is a compact and convex subset of $C$.
2. For any finite subset $u_1,\ldots,u_N$ of $A$ we have: $u_2:C_{u_1}\rar C_{u_1}$, because for all $x\in C_{u_1}$: $u_2(x)=u_2(u_1(x))=u_1(u_2(x))$ and thus $u_2(x)$ is a fixed point of $u_1$, i.e. $u_2(x)\in C_{u_1}$. Therefore $u_1$ and $u_2$ have a common fixed point. By induction we find a point $x$ such that for all $j$: $u_j(x)=x$.
3. Finally for any finite subset $B$ of $A$ the set $C_B$ of fixed points of all $u\in B$ is again a compact and convex subset of $C$. By 2. the family $C_B$, $B$ a finite subset of $A$, has the finite intersection property and thus by compactness $\bigcap\{C_B:B\sbe A,|B| < \infty\}\neq\emptyset$. $\eofproof$
If $A$ is a compact group of linear transformations, then this theorem also holds without the assumption of commutativity - this is known as Kakutani's Theorem.
Let $S$ be Polish, $\mu\in M_1(S)$ and let $P_tf(x)=\int f(y)\,P_t(x,dy)$ be a continuous Feller semigroup on $C_b(S)$. Then $P_t^*\nu(A)=\P^\nu(X_t\in A)$. If there is a probability measure $\nu$ such that for all $\e > 0$ there is a compact set $K$ such that for all $t > 0$: $\P^\nu(X_0\in K,X_t\in K^c) < \e$, then the family $P_t^*\nu$, $t > 0$, is relatively compact in $M_1(S)$ and there is an invariant probability measure $\mu$ in $M_1(S)$, i.e. $P_t^*\mu=\mu$ for all $t > 0$.
2. Moreover, if there is only one invariant probability measure $\mu\in M_1(S)$, then for all $f\in C_b(S)$ $A_tf$ converges pointwise to $\int f\,d\mu$. Hence $P_t$ is ergodic in $L_1(\mu)$.
Suggested solution
Applying this result to $\mu=\d_x$ we infer that there is an invariant probability measure provided we can find for any $\e > 0$ a compact subset $K$ of $S$ such that for all $t$: $P_t(x,K^c) < \e$.
For any vector field $X$ on a compact manifold $M$ with normalized volume $v$ there is a probability measure $\mu$ invariant under the flow of $X$. If there exist measures $\nu_1$ and $\nu_2$ such that for all $t > 0$ and all Borel subsets $A$: $\nu_1(A)\leq v(\theta_t\in A)\leq\nu_2(A)$, then $\mu$ may be choosen such that: $\nu_1(A)\leq\mu(A)\leq\nu_2(A)$. Moreover, we have: $$ v(\theta_t\in A) =v(\theta_{-t}(A)) =\int_A|\det T_x\theta_{-t}|\,v(dx) \quad\mbox{and}\quad \ftd tv(\theta_t\in A) =\int_{\theta_{-t}(A)}\divergence X\,dv~. $$ Hence, if $\r(x)\colon=\sup\{|\det T_x\theta_{-t}|:t > 0\} < \infty$, then $\mu(A)\leq\int_A\r\,dv$.
The Markov operator $P$ in exam has an invariant probability measure if $\bigcup_j w_j(S)$ is contained in a compact set. 2. If there is a compact subset $K$ such that for all $j$: $w_j(K)\sbe K$, then $P$ has an invariant probability measure $\mu$ such that $\mu(K)=1$.
If $\theta:S\rar S$ is a continuous transformation on a compact space $S$, then $\theta$ admits an invariant probability measure $\mu$.
For a sequence $\mu_n$ in $M_1(S)$ the following statements are equivalent:
  1. $\mu_n$ converges in $M_1(S)$ to $\mu$.
  2. For all $f\in C_b(S)$: $\lim_n\int f\,d\mu_n=\int f\,d\mu$.
  3. For all $f\in\lip_b(S)$: $\lim_n\int f\,d\mu_n=\int f\,d\mu$.
  4. For all closed sets $A\sbe S$: $\limsup_n\mu_n(A)\leq\mu(A)$.
  5. For all open sets $U\sbe S$: $\liminf_n\mu_n(U)\geq\mu(U)$.
  6. For all Borel sets $B\sbe S$ satisfying $\mu(\pa B)=0$: $\lim_n\mu_n(B)=\mu(B)$.
If in addition $S$ is locally compact, then there is another equivalent condition:
  1. For all $f\in C_0(S)$: $\lim_n\int f\,d\mu_n=\int f\,d\mu$.
$\proof$ 1. and 2. are equivalent by definition.
4. and 5. are complementary and thus equivalent.
4.$\Rar$6.: Since $\mu(\cl B)=\mu(B^\circ)$ 4. and 5. imply that $$ \limsup_n\mu_n(\cl B) \leq\mu(\cl B) =\mu(B^\circ) \leq\liminf_n\mu_n(B^\circ) $$ 3.$\Rar$4.: Define for $\e > 0$: $$ f_\e(x)\colon=\frac{d(x,A_\e^c)}{d(x,A)+d(x,A_\e^c)}, $$ where $A_\e\colon=\{x\in S:\,d(x,A) < \e\}$. Then $f_\e\in\lip_b(S)$ and: $I_A\leq f_\e\leq I_{A_\e}$. Hence we conclude by assumption that: $\limsup_n\mu_n(A)\leq\mu(A_\e)$. For $\e\dar0$ the conclusion follows.
6.$\Rar$2.: Given $f\in C_b(S)$ then disregarding an at most countable number of points $a,b\in\R$: $$ \mu(a < f < b)=\mu(a\leq f\leq b)~. $$ For $\e > 0$ choose $a_0 < a_1 < \cdots a_k$ such that $a_j-a_{j-1} < \e$, $a_0 < f < a_k$ and $\mu(a_{j-1} < f < a_j)=\mu(a_{j-1}\leq f\leq a_j)$. Then we get: $$ \Big|\int f\,d\mu_n-\int f\,d\mu\Big|\leq 2\e+2\norm f\sum_{j=1}^k |\mu_n(a_{j-1} < f\leq a_j)-\mu(a_{j-1} < f\leq a_j)|~. $$ By assumption this implies that $$ \limsup_n\Big|\int f\,d\mu_n-\int f\,d\mu\Big|\leq2\e~. $$ Obviously 2. implies 3.
If $S$ is also locally compact, the equivalence follows from
exam. $\eofproof$
Suppose $\theta:S\rar S$ is continuous. Let $\mu_n$ be a convergent sequence of $\theta$ invariant probability measures in $M_1(S)$. Prove that the limit $\mu$ is $\theta$ invariant. I.e. the set of $\theta$ invariant probability measures is closed in $M_1(S)$.
Let $\mu_n$ be a sequence in $M_1(S)$. Then for all $f\in C_b(S)$ there is a countable subset $D\sbe\R$ such that $$ \forall a,b\notin D\,\forall n;\quad \mu_n(a < f < b)=\mu_n(a\leq f\leq b)~. $$
If $\mu_n$ is a Cauchy sequence in $(M_1(S),d_L)$, then $\mu_n$ converges in $M_1(S)$.

Convergence in $M_1(\R^n)$

In case $S=\R^n$ we get another description via cheracteristic functions, i.e. via Fourier transforms: For $\mu\in M_1(\R^n)$ the function $\vp:\R^n\rar\C$, $$ \vp(x)\colon=\int e^{i\la x,y\ra}\,\mu(dy) $$ is called the characteristic function
of $\mu$. Up to some trivia this is just what analysts call Fourier transform $\wh\mu$ of $\mu$: $$ \wh\mu(x)\colon=\frac1{(2\pi)^{n/2}}\int e^{-i\la x,y\ra}\,\mu(dy)~. $$ The constant $(2\pi)^{-n/2}$ in front of the integral is choosen in order to make the Fourier transform an isometry on $L_2(\R^n)$, i.e. the mapping $$ f\mapsto\wh f,\quad \wh f(x)\colon=\frac1{(2\pi)^{n/2}}\int f(y)e^{-i\la x,y\ra}\,dy $$ extends to an isometry onto $L_2(\R^n)$! If $X_k:\O\rar\R^n$ is a sequence of random variables on a probability space $(\O,\F,\P)$, such that the sequence $\mu_k\colon=\P_{X_k}$ converges weakly to $\mu\in M_1(\R^n)$, then we say that $X_k$ converges in distribution (or weakly) to a random variable $X$ with distribution $\mu$.
Let $\mu_k$ be a sequence in $M_1(\R^n)$ with characteristic functions $\vp_k$.
  1. If $\mu_k$ converges weakly to $\mu$, then $\vp_k$ converges uniformly on compact sets to the characteristic function $\vp$ of $\mu$.
  2. If $\vp_k$ converges pointwise to a function $\vp$ continuous at $0$, the $\mu_k$ converges weakly to a probability measure $\mu$ with characteristic function $\vp$.
← Another Bunch of Examples → Glossary

<Home>
Last modified: Mon Nov 4 14:27:30 CET 2024