Appendix

Maximal Functions of Reversible Markov Chains

Rota's Theorem

Now let's assume $(X_n,\F_n,\P^x)$ is a reversible Markov chain with respect to the probability measure $\mu$ on $S$. By $\F_n^\prime$ we will denote the $\s$-algebra generated by the $n$-fold shift operator $\Theta_n$ - in case $\F=\F_\infty^X$ $\F_n^\prime$ is simply the $\s$-algebra generated by $X_n,X_{n+1},\ldots$.

For all bounded measurable functions $f:S\rar\R$ and all $n,m\in\N_0$ we have: $$ \E^\mu(f(X_m)|\F_{m+n}^\prime)=P^nf(X_{m+n})~. $$

$\proof$ We will show by induction on $k$, that for all bounded measurable functions $g_0,\ldots,g_k:S\rar\R$: $$ \E^\mu(f(X_m)g_0(X_{m+n})\ldots g_k(X_{m+k+n})) =\E^\mu(P^nf(X_{m+n})g_0(X_{m+n})\ldots g_k(X_{m+k+n})) $$ By the Monotone Class Theorem this implies that for all bounded $\F_{m+n}^\prime$-measurable functions $F$: $$ \E^\mu(f(X_m)F)=\E^\mu(P^nf(X_{m+n})F) $$ The above equality holds for $k=0$: On the one hand we have by the Markov property \begin{eqnarray*} \E^\mu(f(X_m)g(X_{m+n})) &=&\E^\mu(f(X_m)\E^\mu(g(X_{m+n})|\F_m))\\ &=&\E^\mu(f(X_m)P^ng(X_m)) =\int fP^ng\,d\mu~. \end{eqnarray*} On the other hand we get by $\mu$-invariance: $$ \E^\mu(P^nf(X_{m+n})g(X_{m+n})) =\int P^nf.g\,d\mu~. $$ Hence both sides coincide by symmetry of $P^n$.
Now by induction hypothesis we get putting $G\colon=g_0(X_{m+n})\ldots g_k(X_{m+k+n})$: \begin{eqnarray*} \E^\mu(f(X_m)Gg_{k+1}(X_{m+k+1+n})) &=&\E^\mu(f(X_m)G\E^\mu(g_{k+1}(X_{m+k+1+n})|\F_{m+k+n}))\\ &=&\E^\mu(f(X_m)GPg_{k+1}(X_{m+k+n}))\\ &=&\E^\mu(P^nf(X_{m+n})GPg_{k+1}(X_{m+k+n}))\\ &=&\E^\mu(P^nf(X_{m+n})G\E^\mu(g_{k+1}(X_{m+k+1+n})|\F_{m+k+n}))\\ &=&\E^\mu(P^nf(X_{m+n})Gg_{k+1}(X_{m+k+1+n}))\\~. \end{eqnarray*} $\eofproof$

The following example reflects the symmetry:

For all $m\leq n\in\N_0$ and all bounded measurable functions $f:S\rar\R$: $$ \E^\mu(f(X_{n-m})|\F_n^\prime) =P^mf(X_n) =\E^\mu(f(X_{n+m})|\F_n) =\E^{X_n}f(X_m)~. $$

Let $G:(\O,\F_n)\rar\R$ be bounded, then $\E(G|\F_n^\prime)$ is $\s(X_n)$-measurable and we have for all $n$: $$ \E^\mu\Big(f_n(X_n)f_{n-1}(X_{n-1})\cdots f_0(X_0)\Big|\F_n^\prime\Big) =\E^{X_n}(f_n(X_0)\cdots f_0(X_n))~. $$

$\proof$ By exam we have (for $k=0,\ldots,n$): $$ \E^\mu(f_k(X_k)|\F_n^\prime)=P^{n-k}f_k(X_n)=\E^{X_n}f_k(X_{n-k})~. $$ This proves the assertion for one factor, in particular $$ \E^\mu(f_0(X_0)|\F_1^\prime)=Pf_0(X_1)~. $$ On the other hand we get by the Markov property for all $x$: $Pf_0(X_{n-1})=\E^x(f_0(X_n)|\F_{n-1})$ and therefore: $$ Pf_0(X_{n-1})=\E^{X_n}(f_0(X_n)|\F_{n-1})~. $$ Now put for $k=1,\ldots,n$: $G=f_k(X_k)\cdots f_0(X_0)$, then we conclude by induction on the number of factors: \begin{eqnarray*} \E^\mu(G|\F_n^\prime) &=&\E^\mu\Big(f_k(X_k)\cdots f_1(X_1)\E^\mu(f_0(X_0)|\F_1^\prime)\Big|\F_n^\prime\Big)\\ &=&\E^\mu\Big(f_k(X_k)\cdots f_1(X_1)Pf_0(X_1)\Big|\F_n^\prime\Big)\\ &=&\E^{X_n}(f_k(X_{n-k})\cdots f_1(X_{n-1})Pf_0(X_{n-1}))\\ &=&\E^{X_n}(f_k(X_{n-k})\cdots f_1(X_{n-1})\E^{X_n}(f_0(X_n)|\F_{n-1}))\\ &=&\E^{X_n}(f_k(X_{n-k})\cdots f_1(X_{n-1})f_0(X_n))~. \end{eqnarray*} By the Monotone Class Theorem we infer that for all bounded $G:(\O,\F_n)\rar\R$ the conditional expectation $\E(G|\F_n^\prime)$ is $\s(X_n)$-measurable. $\eofproof$

Maximal Theorem for reversible Markov chains

By Rota's Theorem the sequence $(P^nf(X_{m+n}),\F_{m+n}^\prime)_{n\in\N_0}$ is for all $m\in\N_0$ and all $f\in L_1(\mu)$ a reverse martingale: indeed, $P^nf(X_{m+n})=\E^\mu(f(X_m)|\F_{n+m}^\prime)$ and $\F_n^\prime$ is a decreasing sequence of $\s$-algebras.

For $f\in L_p(\mu)$ the function $\sup_n|P^nf|$ is called the maximal function of $f$ with respect to a reversible Markov chain with Markov operator $P$.

Suppose $P$ is the Markov operator of a reversible Markov chain with respect to a probability measure $\mu$ on $S$. Then for all $f\in L_p(\mu)$ and $\tfrac1p+\tfrac1q=1$: $$ \Big(\int\sup_n|P^nf|^p\,d\mu\Big)^{1/p} \leq2q\Big(\int|f|^p\,d\mu\Big)^{1/p}~. $$

$\proof$ $M_n\colon=P^nf(X_n)=\E^\mu(f(X_0)|\F_{n}^\prime)$ is by definition a reverse martingale with respect to $\F_n^\prime$ and by the Markov property we have for all $m$: $$ \E^\mu(M_m|\F_0) =\E^\mu(P^mf(X_m)|\F_0) =P^{2m}f(X_0)~. $$ Now for every increasing convex function $\vp:\R_0^+\rar\R_0^+$ and all $m$: \begin{eqnarray*} \E^\mu\Big(\vp(\sup_n|M_n|)\Big|\F_0\Big) &\geq&\E^\mu(\vp(|M_m|)|\F_0)\\ &\geq&\vp(|\E^\mu(M_m|\F_0)|) =\vp(|P^{2m}f(X_0)|)~. \end{eqnarray*} Hence $$ \E^\mu\Big(\vp(\sup_n|M_n|)\Big|\F_0\Big) \geq\sup_n\vp(|P^{2n}f(X_0)|) $$ and since the distribution of $X_0$ under $\P^\mu$ is $\mu$ we get for $\vp(x)=x^p$ by Doob's inequality: \begin{eqnarray*} \norm{\sup_n|P^{2n}f|}_{L_p(\mu)} &=&\Big(\E^\mu\sup_n|P^{2n}f(X_0)|^p\Big)^{1/p}\\ &\leq&\Big(\E^\mu\sup_n|M_n|^p\Big)^{1/p} \leq q(\E^\mu|M_0|^p)^{1/p} =q\norm f_{L_p(\mu)} \end{eqnarray*} Applying this inequality to $Pf$ yields: $$ \norm{\sup_n|P^{2n+1}f|}_{L_p(\mu)} \leq q\norm{Pf}_{L_p(\mu)} =q\norm{f}_{L_p(\mu)}~. $$ $\eofproof$

Weak * Compactness

The weak * topology on the dual $E^*$ of a Banach space $E$ is the coarsest topology on $E^*$ such that all evaluation maps $x^*\mapsto x^*(x)$, $x\in E$ are continuous. A subset $U$ of $E^*$ is open in the weak * topology iff for all $x^*\in U$ there is a finite set $x_1,\ldots,x_n\in E$ such that $$ \bigcap_j\{y^*:\,|y^*(x_j)-x^*(x_j)| < 1\}\sbe U~. $$

A sequence $x_n^*$ in $E^*$ converges in the weak * topology to $0$ iff for all $x\in E$: $x_n^*(x)\to0$.

The closed unit ball of the dual $E^*$ of a Banach space $E$, i.e. the set $$ \cl{B_{E^*}}\colon=\Big\{x^*\in E^*:\sup_{\Vert x\Vert\leq1}|x^*(x)|\leq1\Big\} $$ is weakly * compact. In particular a norm bounded subset $A$ of $E^*$ is relatively weakly * compact in $E^*$ and its weak * closure is norm bounded.

That doesn't mean that a norm bounded subset $A$ is sequentially weakly * compact. However if $E$ is separable, then the weak * topology on any bounded subset $A$ of $E^*$ is metrizable and thus $A$ is sequentially weakly * compact: just choose a dense subset $x_n\in B_E$ and put $$ d(x^*,y^*)\colon=\sum_n2^{-n}|x^*(x_n)-y^*(x_n)|~. $$ Hence in this case every sequence $x_n^*$ in $A$ has a sub-sequence converging in the weak * topology to some $x^*\in E^*$.

The space $M_1(S)$

For any Polish space $S$ the set $M_1(S)$ of Borel probability measures on $S$ can be considered as a subset of $C_b(S)^*$: just identify $\mu$ with the linear functional $$ f\mapsto\int f\,d\mu~. $$ The norm of this functional is obviously $1$. By the Banach-Alaoglu Theorem the set $M_1(S)$ is relatively weakly * compact in $C_b(S)^*$. However a point in the weak * closure of $M_1(S)$ need not be a probability measure on $S$: take for example the sequence of Dirac measures $\d_n$ in $M_1(\R)$: if $\nu$ is any accumulation point of this sequence in $M_1(\R)$, then for all compact sets $K\sbe\R$: $\nu(K)=0$, i.e. $\nu(\R)=0$. This simply shows that $M_1(S)$ is in general not a closed subset of $C_b(S)^*$. Of course, if $S$ is compact, then $M_1(S)$ is closed in $C(X)^*$ and therefore it's compact in the weak * topology.

The set $L\colon=\{(x_n)\in\ell_1=c_0^*:0\leq x_n,\,\sum x_n=1\}$ is closed and convex (and thus weakly closed) but it's not weakly * closed.

Let $\b S$ be the Stone-Čech compactification of the Polish space $S$ (cf. e.g. wikipedia). Then $C_b(S)$ is isometrically isomorphic to $C(\b S)$. By the Riesz Representation Theorem its dual can be identified with the space of all finite (signed, complex) Borel measures on $\b S$.

As a Polish space $S$ is Čech-complete, it's a $G_\d$-subset of $\b S$. Hence a probability measure $\mu$ on $\b S$ is in $M_1(S)$ iff $\mu(S)=1$, i.e. $\mu(\b S\sm S)=0$. Moreover $S$ is a $G_\d$-subset of any compactification $cS$. From topology we know that there is a compact Polish space $cS$ containing $S$ as a dense subspace. Thus the probability measures $M_1(S)$ on $S$ can also be regarded as a subset of the probability measures $M_1(cS)$ - just put $\mu(cS\sm S)=0$. Furnishing $M(cS)$ with the weak * topology, the set $M_1(cS)$ becomes a metrizable compact subset. The topology on $M_1(S)$ inherited from $M_1(cS)$ is by definition weaker than the topology inherited from $M_1(\b S)$. However a subset $\G$ of $M_1(S)$ is weakly * compact iff it is weakly * compact in the topology inherited from $C(cS)^*$. If in addition $S$ is locally compact, then we may choose $cS$ to be the one-point compactification $S^\o$ and thus me may consider $M_1(S)$ as a subset of $C_0(S)^*$. Anyhow, a subset $\G$ of $M_1(S)$ is weakly * compact iff it is weakly * compact in the topology inherited from $C_0(S)^*$, cf. theorem and exam.

Suppose $S$ is locally compact (and Polish) and let $S^\o$ be the one-point compactification of $S$. Then $C_0(S)$ is a closed subspace of $C(S^\o)$ of co-dimension $1$.

Suppose $S$ is locally compact (and Polish) but not compact. Show that the set $M_1(S)$ is not weakly * closed in $C_0(S)^*$.

So what we are really looking for is a criterion for a subset $\G$ of $M_1(S)$ to be relatively weakly * compact in the space $M_1(S)$ with the weak * topology inherited from $C_b(S)^*$. Before doing this (in theorem) we give a description of the topological space $M_1(S)$, showing that it is not at all exotic.

The space $M_1(S)$ with the weak * topology inherited from $C_b(S)^*$ is a Polish space and a suitable metric is given by $$ d_L(\mu,\nu)\colon= \inf\{\d > 0:\,\forall A=\cl A:\ \mu(A)\leq\nu(A_\d)+\d, \nu(A)\leq\mu(A_\d)+\d\}~. $$ In fact $(M_1(S),d_L)$ is a Polish space and the metric $d_L$ is called the Lévp metric .

In the following we will always consider $M_1(S)$ with this topology! It's definitely a bit confusing that this topology on $M_1(S)$ is generally called the weak topology and the corresponding convergence the weak convergence of probability measures.

The function $d_H:M_1(S)\times M_1(S)\rar\R_0^+$, $$ d_H(\mu,\nu) \colon=\sup\Big\{\Big|\int f\,d\mu-\int f\,d\nu\Big|: \,\lip(f)\leq1, \norm f_\infty\leq1\Big\} $$ is another metric on $M_1(S)$. $d_H$ is called the Hutchinson metric . 2. $d_H$ is stronger than $d_L$, i.e. every sequence converging with respect to $d_H$ converges with respect to $d_L$. 3. If $S$ is compact, then both metrics are equivalent, i.e. a sequence converges with respect to $d_H$ if and only if it converges with respect to $d_L$.

$\proof$ 2. Let $A\sbe S$ be closed and $1 > r > 0$. Define $f:S\rar[0,1]$ by $f(x)=(1-r^{-1}d_A(x))^+$, then $\lip(f)=r^{-1}$, $f|A=1$ and $f|A_r^c=0$. Hence $$ r(\mu(A)-\nu(A_r)) \leq\int rf\,d\mu-\int rf\,d\nu \leq d_H(\mu,\nu)~. $$ This shows that $\mu(A)\leq\nu(A_r)+r^{-1}d_H(\mu,\nu)$ and by symmetry: $\nu(A)\leq\mu(A_r)+r^{-1}d_H(\mu,\nu)$. Now choose $r=d_H(\mu,\nu)^{1/2}$, then it follows that $$ d_L(\mu,\nu)\leq\sqrt{d_H(\mu,\nu)}~. $$ 3. The set $L\colon=\{f\in C(S):\lip(f)\leq1,\norm f\leq1\}$ is compact in $C(S)$ by the Arzelà-Ascoli Theorem. Hence by exam $d_H$ is continuous on $M_1(S)\times M_1(S)$. $\eofproof$

If $S$ is compact, then the function $d:M_1(S)\times M_1(S)\rar\R_0^+$, $$ d(\mu,\nu) \colon=\sup\Big\{\Big|\int f\,d\mu-\int f\,d\nu\Big|:\,\lip(f)\leq1\Big\} $$ is another metric on $M_1(S)$. Prove that $d_H\leq d\leq diam(S)d_H$. Hence $d$ is equivalent to the Hutchinson metric.

Compactness in $M_1(S)$

For any open subset $U$ of $S$ the set $C\colon=\{\mu\in M_1(S):\,\mu(U)\leq\d\}$ is closed in $M_1(S)$.

$\proof$ This simply follows from $$ C=\Big\{\mu\in M_1(S):\,\forall f\in C_b(S):\,0\leq f\leq I_U \quad\int f\,d\mu\leq\d\Big\}~. $$

Let $\G$ be a subset of $M_1(S)$. Then $\G$ is relatively compact in $M_1(S)$, if and only if for all $\e > 0$ there is a compact subset $K$ of $S$, such that for all $\mu\in\G$: $\mu(K^c) < \e$ - in this case the family $\G$ is said to be tight .

$\proof$ To prove sufficiency assume $f:S\rar[0,1]$ is continuous and $f|K=1$. Then for all $\mu\in\G$: $\int f\,d\mu\geq\mu(K) > 1-\e$ and thus for any point $x^*\in C_b(S)^*$ in the weak * closure of $\G$: $x^*(f)\geq1-\e$, $x^*$ is positive and $x^*(1)=1$. Hence there is a Borel probability measure $\nu$ on the Stone-Čech completion $\b S$ of $S$; as $S$ is Polish $S$ is a $G_\d$-set in $\b S$ (i.e. Polish spaces are Čech complete) and the above conditions imply that $\nu(\b S\sm S)=0$, i.e. $\nu\in M_1(S)$.
Necessity: As all $\mu\in\cl\G$ are regular we may find for every $\e > 0$ and every $m\in\N$ a compact subset $K_m(\mu)$ such that $\mu(K_m(\mu)) > 1-2^{-m}\e$. Put \begin{equation}\label{weseq1}\tag{WES1} \G_m(\mu)\colon=\{\mu\in\G:\,\mu(K_m(\mu)_{1/m}) > 1-2^{-m}\e\}, \end{equation} By lemma this set is an open neighborhood of $\mu$ in $M_1(S)$. Since $\cl\G$ is compact there are $\mu_1,\ldots,\mu_{N(m)}$, such that: $$ \G\sbe\bigcup_{j=1}^{N(m)}\G_m(\mu_j)~. $$ The set $K\colon=\bigcap_m\bigcup_{j=1}^{N(m)}K_m(\mu_j)_{1/m}$ is pre-compact, for each set $\bigcup_{j=1}^{N(m)}K_m(\mu_j)_{1/m}$ has a finite $1/m$-net - and for all $\mu\in\G$ we have for some $k=k(m)$: $\mu\in\G_m(\mu_k)$ and thus by \eqref{weseq1}: $$ \mu(K^c) \leq\sum_m\mu\Big(\bigcap_{j=1}^{N(m)}K_m(\mu_j)_{1/m}^c\Big) \leq\sum_m\mu(K_m(\mu_k)_{1/m}^c) < \sum_m2^{-m}\e=\e~. $$ As $S$ is complete and $K$ is closed, $K$ must be compact by proposition. $\eofproof$

If $\G$ is relatively compact in $M_1(S)$, then $\convex{\G}$ is relatively compact in $M_1(S)$. By proposition this result holds for any subset $\G$ of a topological vector space $E$ if $\G$ is contained in a convex subset $M$, which is metrizable and complete.

If $S$ is locally compact (and Polish), then a subset $\G$ of $M_1(S)$ is relatively compact in $M_1(S)\sbe C_0(S)^*$, iff for all $\e > 0$ there is a compact subset $K$ of $S$, such that for all $\mu\in\G$: $\mu(K^c) < \e$, i.e. iff $\G$ is relatively compact in $M_1(S)\sbe C_b(S)^*$.

Let $S$ be Polish, $\mu\in M_1(S)$ and $K\in\R^+$. Use regularity of $\mu$ to show that the set of probability measures $\nu$ on $S$ such that for all Borel sets $B$: $\nu(B)\leq K\mu(B)$ is relatively compact in $M_1(S)$.

The existence of invariant probability measures mostly relies on the following result (cf. e.g. wikipedia):

Let $C$ be a compact and convex subset of a topological vector space and $A$ a commuting family of continuous linear (or affine) maps $u:C\rar C$. Then there is some $x\in C$ such that for all $u\in A$: $u(x)=x$.

$\proof$ 1. For a single $u\in A$ take an arbitrary $y\in C$, put $$ x_n\colon=\frac1n\sum_{j=0}^{n-1}u^j(y), $$ $F_n\colon=\{x_m:m\geq n\}$ and $v\colon=u-1$. Then (cf. proposition): $$ u(x_n)=x_n+\frac1n(u^{n}(y)-y), \quad\mbox{i.e.}\quad v(F_n)\sbe\bigcup\{t(C-C):0\leq t\leq1/n\}~. $$ Hence any ultra-filter generated by the filberts $\{F_n:n\in\N\}$ converges to a fixed point $x$ of $u$. Moreover the set of fixed points $C_u$ of $u$ is a compact and convex subset of $C$.
2. For any finite subset $u_1,\ldots,u_N$ of $A$ we have: $u_2:C_{u_1}\rar C_{u_1}$, because for all $x\in C_{u_1}$: $u_2(x)=u_2(u_1(x))=u_1(u_2(x))$ and thus $u_2(x)$ is a fixed point of $u_1$, i.e. $u_2(x)\in C_{u_1}$. Therefore $u_1$ and $u_2$ have a common fixed point. By induction we find a point $x$ such that for all $j$: $u_j(x)=x$.
3. Finally for any finite subset $B$ of $A$ the set $C_B$ of fixed points of all $u\in B$ is again a compact and convex subset of $C$. By 2. the family $C_B$, $B$ a finite subset of $A$, has the finite intersection property and thus by compactness $\bigcap\{C_B:B\sbe A,|B| < \infty\}\neq\emptyset$. $\eofproof$

If $A$ is a compact group of linear transformations, then this theorem also holds without the assumption of commutativity - this is known as Kakutani's Theorem.

Let $S$ be Polish, $\mu\in M_1(S)$ and let $P_tf(x)=\int f(y)\,P_t(x,dy)$ be a continuous Feller semigroup on $C_b(S)$. Then $P_t^*\nu(A)=\P^\nu(X_t\in A)$. If there is a probability measure $\nu$ such that for all $\e > 0$ there is a compact set $K$ such that for all $t > 0$: $\P^\nu(X_0\in K,X_t\in K^c) < \e$, then the family $P_t^*\nu$, $t > 0$, is relatively compact in $M_1(S)$ and there is an invariant probability measure $\mu$ in $M_1(S)$, i.e. $P_t^*\mu=\mu$ for all $t > 0$.
2. Moreover, if there is only one invariant probability measure $\mu\in M_1(S)$, then for all $f\in C_b(S)$ $A_tf$ converges point-wise to $\int f\,d\mu$. Hence $P_t$ is ergodic in $L_1(\mu)$. Suggested solution

Applying this result to $\mu=\d_x$ we infer that there is an invariant probability measure provided we can find for any $\e > 0$ a compact subset $K$ of $S$ such that for all $t$: $P_t(x,K^c) < \e$.

For any vector field $X$ on a compact manifold $M$ with normalized volume $v$ there is a probability measure $\mu$ invariant under the flow of $X$. If there exist measures $\nu_1$ and $\nu_2$ such that for all $t > 0$ and all Borel subsets $A$: $\nu_1(A)\leq v(\theta_t\in A)\leq\nu_2(A)$, then $\mu$ may be chosen such that: $\nu_1(A)\leq\mu(A)\leq\nu_2(A)$. Moreover, we have: $$ v(\theta_t\in A) =v(\theta_{-t}(A)) =\int_A|\det T_x\theta_{-t}|\,v(dx) \quad\mbox{and}\quad \ftd tv(\theta_t\in A) =\int_{\theta_{-t}(A)}\divergence X\,dv~. $$ Hence, if $\r(x)\colon=\sup\{|\det T_x\theta_{-t}|:t > 0\} < \infty$, then $\mu(A)\leq\int_A\r\,dv$.

The Markov operator $P$ in exam has an invariant probability measure if $\bigcup_j w_j(S)$ is contained in a compact set. 2. If there is a compact subset $K$ such that for all $j$: $w_j(K)\sbe K$, then $P$ has an invariant probability measure $\mu$ such that $\mu(K)=1$.

If $\theta:S\rar S$ is a continuous transformation on a compact space $S$, then $\theta$ admits an invariant probability measure $\mu$.

For a sequence $\mu_n$ in $M_1(S)$ the following statements are equivalent:

$\mu_n$ converges in $M_1(S)$ to $\mu$.
For all $f\in C_b(S)$: $\lim_n\int f\,d\mu_n=\int f\,d\mu$.
For all $f\in\lip_b(S)$: $\lim_n\int f\,d\mu_n=\int f\,d\mu$.
For all closed sets $A\sbe S$: $\limsup_n\mu_n(A)\leq\mu(A)$.
For all open sets $U\sbe S$: $\liminf_n\mu_n(U)\geq\mu(U)$.
For all Borel sets $B\sbe S$ satisfying $\mu(\pa B)=0$: $\lim_n\mu_n(B)=\mu(B)$.

If in addition $S$ is locally compact, then there is another equivalent condition:

For all $f\in C_0(S)$: $\lim_n\int f\,d\mu_n=\int f\,d\mu$.

$\proof$ 1. and 2. are equivalent by definition.
4. and 5. are complementary and thus equivalent.
4.$\Rar$6.: Since $\mu(\cl B)=\mu(B^\circ)$ 4. and 5. imply that $$ \limsup_n\mu_n(\cl B) \leq\mu(\cl B) =\mu(B^\circ) \leq\liminf_n\mu_n(B^\circ) $$ 3.$\Rar$4.: Define for $\e > 0$: $$ f_\e(x)\colon=\frac{d(x,A_\e^c)}{d(x,A)+d(x,A_\e^c)}, $$ where $A_\e\colon=\{x\in S:\,d(x,A) < \e\}$. Then $f_\e\in\lip_b(S)$ and: $I_A\leq f_\e\leq I_{A_\e}$. Hence we conclude by assumption that: $\limsup_n\mu_n(A)\leq\mu(A_\e)$. For $\e\dar0$ the conclusion follows.
6.$\Rar$2.: Given $f\in C_b(S)$ then disregarding an at most countable number of points $a,b\in\R$: $$ \mu(a < f < b)=\mu(a\leq f\leq b)~. $$ For $\e > 0$ choose $a_0 < a_1 < \cdots a_k$ such that $a_j-a_{j-1} < \e$, $a_0 < f < a_k$ and $\mu(a_{j-1} < f < a_j)=\mu(a_{j-1}\leq f\leq a_j)$. Then we get: $$ \Big|\int f\,d\mu_n-\int f\,d\mu\Big|\leq 2\e+2\norm f\sum_{j=1}^k |\mu_n(a_{j-1} < f\leq a_j)-\mu(a_{j-1} < f\leq a_j)|~. $$ By assumption this implies that $$ \limsup_n\Big|\int f\,d\mu_n-\int f\,d\mu\Big|\leq2\e~. $$ Obviously 2. implies 3.
If $S$ is also locally compact, the equivalence follows from exam. $\eofproof$

Suppose $\theta:S\rar S$ is continuous. Let $\mu_n$ be a convergent sequence of $\theta$ invariant probability measures in $M_1(S)$. Prove that the limit $\mu$ is $\theta$ invariant. I.e. the set of $\theta$ invariant probability measures is closed in $M_1(S)$.

Let $\mu_n$ be a sequence in $M_1(S)$. Then for all $f\in C_b(S)$ there is a countable subset $D\sbe\R$ such that $$ \forall a,b\notin D\,\forall n;\quad \mu_n(a < f < b)=\mu_n(a\leq f\leq b)~. $$

If $\mu_n$ is a Cauchy sequence in $(M_1(S),d_L)$, then $\mu_n$ converges in $M_1(S)$.

Convergence in $M_1(\R^n)$

In case $S=\R^n$ we get another description via characteristic functions, i.e. via Fourier transforms: For $\mu\in M_1(\R^n)$ the function $\vp:\R^n\rar\C$, $$ \vp(x)\colon=\int e^{i\la x,y\ra}\,\mu(dy) $$ is called the characteristic function of $\mu$. Up to some trivia this is just what analysts call Fourier transform $\wh\mu$ of $\mu$: $$ \wh\mu(x)\colon=\frac1{(2\pi)^{n/2}}\int e^{-i\la x,y\ra}\,\mu(dy)~. $$ The constant $(2\pi)^{-n/2}$ in front of the integral is chosen in order to make the Fourier transform an isometry on $L_2(\R^n)$, i.e. the mapping $$ f\mapsto\wh f,\quad \wh f(x)\colon=\frac1{(2\pi)^{n/2}}\int f(y)e^{-i\la x,y\ra}\,dy $$ extends to an isometry onto $L_2(\R^n)$! If $X_k:\O\rar\R^n$ is a sequence of random variables on a probability space $(\O,\F,\P)$, such that the sequence $\mu_k\colon=\P_{X_k}$ converges weakly to $\mu\in M_1(\R^n)$, then we say that $X_k$ converges in distribution (or weakly) to a random variable $X$ with distribution $\mu$.

Let $\mu_k$ be a sequence in $M_1(\R^n)$ with characteristic functions $\vp_k$.

If $\mu_k$ converges weakly to $\mu$, then $\vp_k$ converges uniformly on compact sets to the characteristic function $\vp$ of $\mu$.
If $\vp_k$ converges point-wise to a function $\vp$ continuous at $0$, the $\mu_k$ converges weakly to a probability measure $\mu$ with characteristic function $\vp$.

Appendix

Monotone Class Theorem

Martingales and Sub-martingales

Up-crossing

Doob's Inequalities

Reverse martingales

Optional stopping

Maximal Functions of Reversible Markov Chains

Rota's Theorem

Maximal Theorem for reversible Markov chains

Central Limit Theorem for Martingales

Quadratic variation

Application to Markov chains

Weak Compactness

Weak compactness in $L_1(\mu)$

Weak compactness and compactness in $C(S)$

Weak * Compactness

The space $M_1(S)$

Compactness in $M_1(S)$

Convergence in $M_1(\R^n)$