Examples

Markov chains

Conditional expectation

The conditional expectation $\E(X|\F^\prime)$ of an integrable random variable $X:(\O,\F)\rar\R$ with respect to a sub-$\s$-algebra $\F^\prime$ of $\F$ is an $\F^\prime$-measurable random variable, i.e. $\E(X|\F^\prime):(\O,\F^\prime)\rar\R$, such that $$ \forall A^\prime\in\F^\prime:\quad \E(\E(X|\F^\prime);A)=\E(X;A)\colon=\int_AX\,d\P~. $$ By the Radon-Nikodym-Theorem $\E(X|\F^\prime)$ exists and it's $\P$-a.s. unique. The function $\E(I_A|\F^\prime)$ is called the conditional probability of $A\in\F$ given $\F^\prime$ and it's denoted by $\P(A|\F^\prime)$. In general the mapping $A\mapsto\P(A|\F^\prime)(\o)$ is not a probability measure for $\P$ almost all $\o\in\O$ - this is due to the fact that for every sequence of pairwise disjoint sets $A_n\in\F$ the equality $$ \P(\bigcup A_n|\F^\prime)=\sum\P(A_n|\F^\prime) $$ only holds a.e. But there might be loads of such sequences and excluding unions of loads of null sets may wind up in excluding a non null set! In case it is a probability measure for $\P$ almost all $\o\in\O$ it's called a regular conditional probability. For these and any other probabilistic notions we refer to R. Durrett, Probability: Theory and Examples.

Suppose $\O_j$, $1\leq j\leq n$ is a partition of $\O$ such that $\P(\O_j) > 0$ and $\F^\prime=\s(\O_j,1\leq j\leq n)$. Prove that $$ \E(X|\F^\prime)=\sum_j\frac1{\P(\O_j)}\E(X;\O_j)I_{\O_j}~. $$ Moreover on the set $\O_j$ we have $\P(A|\F^\prime)=\P(A\cap\O_j)/\P(\O_j)=\colon\P(A|\O_j)$ and therefore $$ \P(A|\F^\prime)=\sum_j\P(A|\O_j)I_{\O_j}~. $$

Let $\P$ be a probability measure on $\R^2$ with density $\r$ and $X,Y$ denote the projections $(x,y)\mapsto x$ and $(x,y)\mapsto y$ respectively. If $\int\r(x,y)\,dx > 0$, then for all measurable $f:\R^2\rar[0,\infty]$ (suggested solution): $$ \E(f|Y=y) =\frac{\int f(x,y)\r(x,y)\,dx}{\int\r(x,y)\,dx}, $$ where $\E(f|Y=y)\in\R$ is a convenient notation for the value of $\E(f|Y)\colon=\E(f|\s(Y))$ on the set $[Y=y]$. If $\P$ is the uniform distribution on $(a,b)\times(c,d)$, then $$ \E(f|X=x)=\frac1{d-c}\int_c^d f(x,y)\,dy\quad\mbox{and}\quad \E(f|Y=y)=\frac1{b-a}\int_a^b f(x,y)\,dx~. $$

In polar coordinates $(x,y,z)=(\cos\vp\cos\theta,\sin\vp\cos\theta,\sin\theta)$, $\vp\in(0,2\pi)$, $\theta\in(-\pi/2,\pi/2)$, the normalized Haar measure $\s$ on the $2$-sphere $S^2$ has density $(4\pi)^{-1}\cos\theta$. Thus the conditional expectations of $f:S^2\rar\R$ given $\theta$ and $\vp$, respectively, are given by \begin{eqnarray*} \E(f|\theta) &=&\frac1{2\pi}\int_0^{2\pi}f(\cos\vp\cos\theta,\sin\vp\cos\theta,\sin\theta)\,d\vp \quad\mbox{and}\\ \E(f|\vp) &=&\frac1{2}\int_{-\pi/2}^{\pi/2}f(\cos\vp\cos\theta,\sin\vp\cos\theta,\sin\theta)\cos\theta\,d\theta~. \end{eqnarray*}

Markov chains and Markov operator

Suppose $(X_n,\F_n,\P^x)$ is a (homogeneous) Markov chain in a Polish space $S$, i.e.

$\F_n$ is an increasing sequence of sub $\s$-algebras of $\F$.

$X_n$ is an $\F_n$-measurable random variable, i.e. $X_n:(\O,\F_n)\rar(S,\B(S))$.

For all $x\in S$: $\P^x$ is a probability measure on $S$ such that $\P^x(X_0=x)=1$ and $x\mapsto\P^x(A)$ is measurable for all $A\in\F$.

There is a positive (i.e. $f\geq0$ implies $Pf\geq0$) linear operator $P:B(S)\rar B(S)$ mapping bounded measurable functions to bounded measurable functions such that for all $x\in S$ and all $f\in B(S)$: $$ \E^x(f(X_{n+1})|\F_n)=Pf(X_n) \quad\P^x\mbox{-a.s.} $$ That's called the Markov property !

$P$ is called the Markov operator and $X_n$ is a (homogeneous) Markov chain under $\P^x$ starting in $x$. If in addition $P:C_b(S)\rar C_b(S)$, then the Markov chain is called a Feller chain and $P$ a Feller operator . Sometimes $C_b(S)$ is replaced with $C_{bu}(S)$ or $C_0(S)$ - in the latter case $S$ is assumed to be locally compact (cf. section) - this is just a technical issue.

We have for all bounded measurable $f:S\rar\R$ and all $x\in S$: $\E^x(f(X_{n+1})|\F_n)=\E^x(f(X_{n+1})|X_n)$ - because the left hand side is $Pf(X_n)$, which is measurable with respect to the $\s$-algebra generated by $X_n$ - and $$ \E^x f(X_1) =\E^x\E^x(f(X_1)|\F_0) =\E^x Pf(X_0) =Pf(x)~. $$

Prove by induction on $m$ that for all $n,m\in\N_0$: $\E^x(f(X_{n+m})|\F_n)=P^mf(X_n)$. Suggested solution.

Transition functions

Let us put for $A\in\B(S)$ and $x\in S$: $$ P(x,A)\colon=PI_A(x)=\P^x(X_1\in A). $$ Then $A\mapsto P(x,A)$ is a probability measure, $x\mapsto P(x,A)$ is measurable and by the Markov property $P(X_n,A)$ is the conditional probability that $X_{n+1}\in A$ given $\F_n$, i.e. $$ P(X_n,A)=\P^x(X_{n+1}\in A|\F_n) \quad\P^x -a.s~. $$ These conditions simply say that $P(X_n,A)$ is a regular conditional probability for $X_{n+1}$ given $\F_n$. Therefore we also have $$ \forall f\in B(S):\quad \E^x(f(X_{n+1})|\F_n) =\int f(y)\,P(X_n,dy)~. $$ Finally the Markov operator is given by \begin{equation}\label{mareq1}\tag{MAR1} Pf(x)=\int f(y)\,P(x,dy) \end{equation} The mapping $P:S\times\B(S)\rar[0,1]$, $(x,A)\mapsto P(x,A)$, is called a Markovian transition function.

If $P_n(x,A)$ is a sequence of transition functions on $S$ and $p_n:S\rar[0,1]$ a sequence of measurable functions such that $\sum p_n(x)=1$, then $P(x,A)\colon=\sum_n p_n(x)P_n(x,A)$ is a transition function. Describe the corresponding Markov chain.

For all $n,m\in\N_0$: $$ \P^x(X_n\in A,X_{n+m}\in B)=\int_A P^mI_B(y)\,\P_{X_n}^x(dy)~. $$

By exam we have: \begin{eqnarray*} \P^x(X_n\in A,X_{n+m}\in B) &=&\E^x(I_A(X_n)\E^x(I_B(X_{n+m})|\F_n))\\ &=&\E^x(I_A(X_n)P^mI_B(X_n)) =\int_A P^mI_B(y)\,\P_{X_n}^x(dy)~. \end{eqnarray*} Any transformation $\theta:S\rar S$ defines a simple Markov chain, just put $P(x,A)=1$ if $\theta(x)\in A$ and otherwise $P(x,A)=0$, i.e. $P(x,A)=\d_{\theta(x)}(A)$.

Let $G=(V,E)$ be a finite undirected graph , $V$ the set of vertices and $E$ the set of edges. For $x\neq y\in V$ write $x\sim y$ if $\{x,y\}\in E$ and put $d(x)\colon=|\{y\in V:\,y\sim x\}$ - the degree of the vertex $x$. As $G$ is undirected we have $x\sim y$ iff $y\sim x$. $G$ is said to be regular , if every vertex has the same degree. Now put $P(x,\{y\})=1/d(x)$ if $y\sim x$ and $P(x,\{y\})=0$ otherwise. Then $P(x,A)$ is a Markovian transition function. We say the corresponding Markov chain $X_n$ performs a random walk on the graph $G$. Compute its Markov operator.

A lonely knight starts a random walk at the corner of a chess board (performing permissible moves only). Determine all vertices of the graph and all edges: two vertices i.e. two positions $x$ and $y$ of the knight are connected iff the knight can make a permissible move from position $x$ to position $y$. Finally compute the degree of each vertex.

Define a Markov operator on a directed graph $G=(V,E)$.

More generally, if $S$ is a finite (or countable) set and $p(x,y)$, $x,y\in S$ a so called stochastic matrix , i.e.

for all $x,y\in S$: $p(x,y)\geq0$,
for all $x\in S$: $\sum_yp(x,y)=1$.

Then $P(x,\{y\})\colon=p(x,y)$ is a Markovian transition function. Compute its Markov operator. Cf. e.g. R. Durrett, Probability: Theory and Examples

1. Show that the spectrum $\Spec(P)$ of every stochastic matrix $P$ - i.e. the set of its eigenvalues - is contained in the set $\{z\in\C:|z|\leq1\}$. 2. Which stochastic matrices $P\in\Ma(n,\R)$ are orthogonal?

Suppose $P,Q$ are stochastic matrices. Then their Kronecker product $P\otimes Q$ is also a stochastic matrix. If $S$ and $T$ are the state spaces of $P$ and $Q$, respectively, then $S\times T$ is a state space for $P\otimes Q$ and the Markov chain is $(X_n,Y_n)$ and $$ \P^{(x,y)}(X_1=u,Y_1=v)=\P^x(X_1=u)\P^y(Y_1=v)=p(x,u)q(y,v)~. $$

Suppose $Y_1,Y_2,\ldots:(\O,\P)\rar S$ is an i.i.d. sequence in $S$. Put for $x\in S$: $\P^x\colon=\d_x\otimes\P$, $X_0(x,\o)=x$, $X_j(x,\o)=Y_j(\o)$ and $\F_n=\s(X_0,\ldots,X_n)$. Then $(X_n,\F_n,\P^x)$ is a Markov chain defined on $S\times\O$ with values in $S$ and the Markov operator maps any $f\in B(S)$ to the constant function $\E f(Y_1)$.

If $X_1,X_2,\ldots$ is an i.i.d. sequence in $\R^d$ and $\P^x(S_0=x)=1$, then $S_{n+1}\colon=S_n+X_{n+1}$ is a Markov chain on $\R^d$ with $\F_n\colon=\s(X_0,\ldots,X_n)$. The Markov operator is given by: $$ Pf(y)=\int_{\R^d}f(y+z)\,\mu(dz) $$ where $\mu$ is the distribution of $X_1$ under $\P^x$. Moreover if $\E^x X_1=0$, then $(S_n,\F)$ is a martingale (under $\P^x$), i.e. for all $n\in\N_0$: $\E^x(S_{n+1}|\F_n)=S_n$.

Let $(X_n,\F_n,\P^x)$ be a Markov chain with Markov operator $P$. Then for all bounded measurable $f:S\rar\R$: $\E^x(f(X_n)|\F_m)=P^{n-m}f(X_m)$. In particular $P^{n-m}f(X_m)$, $m=0,\ldots,n$, is a martingale.

By the Markov property and exam we have: $\E^x(f(X_n)|\F_m)=P^{n-m}f(X_m)$.

Let $(X_n,\F_n,\P^x)$ be a Markov chain with Markov operator $P$ and put $L\colon=P-1$. Then for all bounded measurable $f:S\rar\R$ $$ M_n^f\colon=f(X_n)-f(X_0)-\sum_{j=0}^{n-1}Lf(X_j) $$ is a martingale with respect to $\P^x$, i.e. for all $n\in\N_0$: $\E^x(M_{n+1}^f|\F_n)=M_n^f$. Suggested solution.

Let $Z_1,Z_2,\ldots$ be i.i.d. random variables uniformly distributed on the sphere $S^{d-1}$ (cf. e.g. section). For a bounded domain $D$ in $\R^d$ put for any $x\in\cl D$: $R(x)\colon=d(x,D^c)$, $S_0\colon=x$ and $S_{n+1}\colon=S_n+R(S_n)Z_{n+1}$. Then $S_n$ is a Markov chain with respect to $\F_n\colon=\s(Z_1,\ldots,Z_n)$ and the Markov operator is given by: $$ Pf(y)=\int_{S^{d-1}}f(y+R(y)z)\,\s(dz)~. $$ where $\s$ denotes the normalized Haar measure on $S^{d-1}$.

For $f\in B(\R^d)$ we get by independence of $Z_{n+1}$ from $\F_n$: \begin{eqnarray*} Pf(S_n) &=&\E(f(S_{n+1})|\F_n)\\ &=&\E(f(S_n+R(S_n)Z_{n+1})|\F_n) =\int f(S_n+R(S_n)z)\,\s(dz)~. \end{eqnarray*}

Write a program which generates a random variable uniformly distributed on the sphere $S^{d-1}$, cf. section. Suggested solution.

Suppose $|a| < 1$, $Z_1,Z_2,\ldots$ i.i.d. in $S=\R^d$ with distribution $\mu$ and put $X_n=aX_{n-1}+Z_n$. Then $X_n$ is a Markov chain with respect to $\F_n=\s(Z_1,\ldots,Z_n)$. This chain is called an autoregressive moving average process, ARMAP for short. Show that its Markov operator is given by $$ Pf(y)=\int_{\R^d} f(ay+z)\,\mu(dz) $$ 2. In case $\mu$ is standard normal we get (suggested solution): $$ P^nf(y)=\int_{\R^d} f\Big(a^ny+z\sqrt{\frac{1-a^{2n}}{1-a^2}}\Big)\,\mu(dz)~. $$

Shift operator and invariant measures

As we have already seen any transformation $\theta:S\rar S$ defines a simple Markov chain. Next we are going to establish the converse result: every Markov chain $X_n$ in a Polish space $S$ may be seen as a transformation on $\O=S^\N$. For this it suffices to assume that there is a mapping $\Theta:\O\rar\O$ such that for all $n$: $$ X_n\circ\Theta=X_{n+1}; $$ $\Theta$ is called the shift operator of the Markov chain and its existence usually follows from the construction of the underlying probability space: this operator is just the Bernoulli shift for the standard construction! Now the Markov property admits an important extension : Let $F:\O\rar\R$ be bounded and measurable with respect to the $\s$-algebra $\F_\infty^X$ generated by $X_0,X_1,\ldots$. Then for all $x\in S$ and all $n\geq0$: \begin{equation}\label{mareq2}\tag{MAR2} \E^x(F\circ\Theta_n|\F_n) =\E^{X_n}F\quad\P^x\mbox{-a.s.} \end{equation} where $\E^{X_n}F$ is the mapping $\o\mapsto\E^{X_n(\o)}F$ (which is $\s(X_n)$-measurable) and $\Theta_n$ is a short hand for the $n$-fold product: $\Theta\circ\cdots\circ\Theta$. The proof is very similar to the proof of proposition and can be found in almost all text books on Markov chains, in particular in R. Durrett, Probability: Theory and Examples. For e.g. $F=f(X_m)$ we have $F\circ\Theta_n=f(X_{m+n})$ and $\E^{X_n}f(X_m)=P^mf(X_n)$; hence in this special case \eqref{mareq2} is just exam.

For all $n,m\in\N_0$, all $A\in\F_n$ and all $B\in\B(S)$ (compare exam): $$ \P^x(X_{n+m}\in B,A)=\E^x(P^mI_B(X_n);A) $$

Since $X_{n+m}\in B$ iff $X_n\circ\Theta_m\in B$ we infer from e.g. the extended Markov property: $$ \P^x(X_{n+m}\in B,A) =\E^x(\E^x(I_{[X_m\in B]}\circ\Theta_n|\F_n)I_A) =\E^x(\P^{X_n}(X_m\in B)I_A) =\E^x(P^mI_B(X_n);A)~. $$ Finally let $\mu$ be a probability measure on $S$ and put for all $A\in\F$: $$ \P^\mu(A)\colon=\int\P^x(A)\,\mu(dx) $$ Then $\P^\mu$ is a probability measure on $\O$.

For all $F\in L_1(\P^\mu)$: $\E^\mu F=\int\E^x F\,\mu(dx)$.

Actually under $\P^\mu$ the sequence $X_0,X_1,X_2,\ldots$ is a Markov chain in $S$ with $$ \P^\mu(X_0\in B) =\int\P^x(X_0\in B)\,\mu(dx) =\int_B1\,\mu(dx) =\mu(B)~. $$ Therefore the initial distribution is $\mu$. To check the Markov property, i.e. $\E^\mu(f(X_{n+1})|\F_n)=Pf(X_n)$, we notice that for all $x\in S$: $\E^x(f(X_{n+1})|\F_n)=Pf(X_n)$ and thus for all $A\in\F_n$: \begin{eqnarray*} \E^\mu(Pf(X_n);A) &=&\int_S\E^x(Pf(X_n);A)\,d\mu\\ &=&\int_S\E^x(\E^x(f(X_{n+1})|\F_n);A)\,d\mu\\ &=&\int_S\E^x(f(X_{n+1});A)\,d\mu =\E^\mu(f(X_{n+1});A)~. \end{eqnarray*} Hence by the definition of conditional expectations $\P^\mu$ a.e.: $\E^\mu(f(X_{n+1})|\F_n)=Pf(X_n)$.

$\mu$ is said to be an invariant measure for the Markov chain, if \begin{equation}\label{mareq3}\tag{MAR3} \forall B\in\B(S):\quad \P^\mu(X_{n+1}\in B) =\P^\mu(X_n\in B) \end{equation}

This shows that $\mu$ is invariant iff for all $n\in\N_0$: $\P^\mu(X_n\in A)=\mu(A)$. We also say that under $\P^\mu$ the sequence $X_0,X_1,\ldots$ is stationary . Can we check stationarity by just inspecting the Markov operator? Yes! because this holds if and only if \begin{equation}\label{mareq4}\tag{MAR4} \forall f\in B(S):\quad \int Pf\,d\mu=\int f\,d\mu~. \end{equation} This is because the Markov property implies (compare exam or the previous reasoning for $A=\O$ and $f=I_B$): $$ \P^\mu(X_{n+1}\in B) =\E^\mu PI_B(X_n) =\int PI_B\,d\P_{X_n}^\mu~. $$ Thus if $\mu$ is invariant, then for all $n$: $\P_{X_n}^\mu=\mu$ and in particular $\P^\mu(X_{n+1}\in B)=\mu(B)$. Hence for all $B\in\B(S)$: $\int PI_B\,d\mu=\mu(B)$ and this is easily seen to be equivalent to \eqref{mareq4}. Conversely if \eqref{mareq4} holds, then we get for $n=0$: $\P^\mu(X_1\in B)=\int PI_B\,d\mu=\mu(B)=\P^\mu(X_0\in B)$ and the assertion follows by induction on $n$. We notice that \eqref{mareq4} also makes sense for arbitrary measures $\mu$!

Lebesgue measure is an invariant measure both for the Markov chain in exam and in exam.

However we will almost exclusively stick to probability measures!

If $\mu=\P_{Y_1}$ is the distribution of $Y_1$, then $\mu$ is invariant for the Markov chain in exam. This Markov chain is also called a Bernoulli scheme, cf. also exam.

If $\mu$ is invariant for the Markov chain $X_n$, then for all $n\in\N_0$ and all $A,B\in\B(S)$: $$ \P^\mu(X_n\in A,X_{n+m}\in B)=\int_AP^mI_B(y)\,\mu(dy)~. $$

Next we verify that the invariance of $\mu$ implies invariance of $\P^\mu$ by the shift operator $\Theta$, i.e. $\Theta$ preserves the probability measure $\P^\mu$ on $\O$: indeed, by the extended Markov property \eqref{mareq2} we have for an invariant measure $\mu$ and any bounded $\F_\infty^X$-measurable $F:\O\rar\R_0^+$: $$ \E^\mu(F\circ\Theta) =\E^\mu\E^\mu(F\circ\Theta|\F_1) =\E^\mu\E^{X_1}F =\E^\mu F, $$ i.e. $\P^\mu$ is a probability measure on $(\O,\F)$ invariant under $\Theta$. Thus any Markov chain with invariant measure can be regarded as a sort of discrete dynamical system on $(\O,\F)$ discussed in the previous section, provided $\O$ is Polish!

Given a stochastic matrix $p(x,y)$ on an discrete Polish space $S$. The associated Markov operator $P$ is a contraction on $\ell_\infty(S)$. 1. If for each finite subset $K$ of $S$ and each $\e > 0$ there is another finite subset $E$ of $S$ such that for all $x\notin E$: $P(x,K) < \e$ (i.e. $P(.,K)\in c_0(S)$), then $P$ is a contraction on $c_0(S)$ and thus $P^*$ is a contraction on $\ell_1(S)$ (the dual of $c_0(S)$ is isometrically isomorphic to $\ell_1(S)$). 2. A (signed) measure $\mu$ on $S$ is just a vector $\mu\in\ell_1(S)$. Verify that if $P$ is a contraction on $c_0(S)$ then $\mu$ is invariant iff $P^*\mu=\mu$, i.e. $$ \forall x\in S:\quad \sum_{y\in S}\mu(y)p(y,x)=\mu(x)~. $$ Give an example of an infinite stochastic matrix such that $P$ doesn't map $c_0(S)$ into $c_0(S)$. Suggested solution.

A stochastic matrix $P=(p(x,y))$ is called doubly stochastic if for all $x,y$: $\sum_zp(x,z)=\sum_z p(z,y)=1$. 1. The normalized counting measure is an invariant probability measure for $P$. 2. A finite undirected graph $G=(V,E)$ (cf. exam) is doubly stochastic iff for all $y\in V$: $\sum_{z\sim y}\frac1{d(z)}=1$. 3. Permutation matrices are doubly stochastic.

Suppose $P,Q$ are stochastic matrices with invariant measures $\mu$ and $\nu$, respectively. Then their Kronecker product $P\otimes Q$ (cf. exam) has invariant measure $\mu\otimes\nu$.

Ergodicity

Formally the Markov operator $P$ is only defined on the vector space $B(S)$ of bounded, measurable functions on $S$. Now the existence of an invariant measures $\mu$ allows us to define $P$ as a positive, linear contraction on the Hilbert space $L_2(\mu)$:

If $\mu$ is invariant then $P:L_2(\mu)\rar L_2(\mu)$ is a contraction.

$\proof$ 1. By Jensen's inequality we have for all $f\in B(S)$ and all $x\in S$: $$ (Pf)^2(x) =\Big(\int f(y)\,P(x,dy)\Big)^2 \leq\int f(y)^2\,P(x,dy) =P(f^2)(x) $$ and by invariance for $f\in B(S)\cap L_2(\mu)$: $$ \int(Pf)^2\,d\mu\leq\int P(f^2)\,d\mu=\int f^2\,d\mu~. $$ Since $B(S)\cap L_2(\mu)$ is dense in $L_2(\mu)$, there is a unique bounded, linear extension of $P$ to $L_2(\mu)$ and this extension is obviously a positive contraction. $\eofproof$

If $\mu$ is invariant for the Markov operator $P$, then for all $f\geq0$: $\Ent(Pf)\leq\Ent(f)$, where $\Ent(f)\colon=\int f\log f\,d\mu$ is called the entropy of $f$ with respect to $\mu$. Beware, in physics and especially in thermodynamics the entropy is defined by $-\int f\log f\,d\mu$! Hence, to physicists the entropy of a Markov chain increases as time goes on.

As $\vp:x\mapsto x\log x$ is convex on $\R^+$, we infer from Jensen's inequality: $$ \int\vp(Pf(x))\,\mu(dx) =\int\vp\Big(\int f(y)\,P(x,dy)\Big)\,\mu(dx) \leq\int P(\vp(f))(x)\,\mu(dx) =\int\vp(f)\,d\mu~. $$

If $\mu$ is invariant for the Markov operator $P$, then $P$ is a contraction on all $L_p(\mu)$, $1\leq p\leq\infty$.

A Markov operator $P$ on $S$ with invariant probability measure $\mu$ is said to be ergodic if $f\in L_2(\mu)$ and $Pf=f$ imply that $f$ is $\mu$ a.e. constant.

Hence $P:L_2(\mu)\rar L_2(\mu)$ is ergodic iff all eigenfunctions of $P$ for the eigenvalue $1$ are constant functions.

The Markov operator in exam is evidently ergodic.
If $(p(x,y))$ is a finite stochastic matrix with a unique invariant probability measure, then the associated Markov operator is ergodic, cf. proposition.

Suppose $P$ is a stochastic matrices with invariant measures $\mu$. If $P$ is ergodic on $L_2(\mu)$, then the Kronecker product $P\otimes P$ (cf. exam) need not be ergodic on $L_2(\mu\otimes\mu)$. Hint: If $-1$ is an eigenvalue of $P$ with eigenvector $x$, then $(P\otimes P)x\otimes x=x\otimes x$. You may take for $P$ the permutation matrix of the permutation $(2,1)\in S(2)$.

Let us remark that there is another commonly used notion of ergodicity for stochastic matrices (irreducible and aperiodic states), which is a bit stronger than the notion we employ! However, we will never refer to that stronger notion.

Any measure $\mu$ supported on $\pa D$, i.e. $\mu((\pa D)^c)=0$, is an invariant measure of the Markov operator $P$ in exam. 2. If $f:D\rar\R$ is harmonic then $Pf=f$. Hence $P$ is not ergodic, no matter what measure we take.

If $\mu$ is supported on $\pa D$, then for all bounded $f\in B(\cl D)$ $$ \int Pf\,d\mu =\iint f(y+R(y)z)\,\s(dz)\,\mu(dy) =\iint f(y+R(y)z)\,\mu(dy)\,\s(dz) =\iint f(y)\,\mu(dy)\,\s(dz) =\int f(y)\,\mu(dy)~. $$ $f$ is harmonic iff for all $x\in D$ and all $r > 0$ satisfying $B_r(x)\sbe D$: $\int f(x+rz)\,\s(dz)=f(x)$. Hence for harmonic functions $f$ we have $Pf=f$.
Remark: Starting at $x\in D$ the sequence $X_n$ converges 'weakly' to a random variable $X$, whose distribution is the harmonic measure on $\pa D$ with respect to $x$. Hence for all continuous $f:\pa D\rar\R$ the function $u(x)\colon=\E^xf(X)$ is the harmonic extension $u:D\rar\R$ of $f$ into the interior of $D$.

Invariant measures on finite sets

For every finite stochastic matrix $P=(p(x,y))$ we have $\dim\ker(P^*-1)\geq1$, because $1$ is an eigenvalue for $P$ and thus $\bar 1=1$ is an eigenvalue for $P^*$ - the spectrum of $P^*$ is the complex conjugate of the spectrum of $P$! Thus $\dim\ker(P^*-1)\geq1$. Yet we don't know if $\ker(P^*-1)$ contains a measure.

Given a stochastic matrix on a finite set $S$. Then there is an invariant probability measure $\mu$. Moreover if there is a unique invariant probability measure $\mu$, then for any probability measure $\nu$ on $S$ the sequence $$ A_n^*\nu\colon=\frac1n\sum_{j=0}^{n-1}P^{*j}\nu $$ converges to $\mu$ and for all functions $f:S\rar\R$ the sequence $$ A_nf\colon=\frac1n\sum_{j=0}^{n-1}P^{j}f $$ converges to the constant function $\sum_y f(y)\mu(y)$ and thus $P$ is ergodic. For a more general result cf. exam.

$\proof$ 1. Suppose $|S|=n$, then the associated Markov operator $P:\ell_\infty^n\rar\ell_\infty^n$ and its adjoint $P^*:\ell_1^n\rar\ell_1^n$ are given by $$ Pf(x)\colon=\sum_y p(x,y)f(y) \quad\mbox{and}\quad P^*\mu(x)\colon=\sum_y p(y,x)\mu(y)~. $$ Let $M_1\sbe\ell_1^n$ be the set of probability measures on $S$, $M_1$ is compact and convex. Since $P^*(M_1)\sbe M_1$ we infer from Brouwer's fixed point theorem that there is some $\mu\in M_1$ such that $P^*\mu=\mu$.
2. Alternatively and more elementary we may take any $\nu\in M_1$ and define $$ \forall n\in\N:\quad \mu_n\colon=A_n^*\nu $$ Then $\norm{P^*\mu_n-\mu_n}\leq2/n$ and thus any accumulation point $\mu$ of the sequence $\mu_n$ is a fixed point of $P^*$. This also shows that $A_n^*\nu$ converges to $\mu$ if $\mu$ is the unique invariant probability measure. Hence for all $f:S\rar\R$ $$ A_nf(x) =\int A_nf\,d\d_x =\int f\,dA_n^*\d_x \to\int f\,d\mu=\sum_y f(y)\mu(y)~. $$ If $Pf=f$, then for all $n$: $f=A_nf\to\int f\,d\mu$ and therefore $f$ must be constant $\eofproof$

Noteworthy the alternative proof also works if the sequence $\mu_n$ has some accumulation point with respect to some metric $d$, say, which is weaker than the metric defined by the norm, i.e. $d(x,y)\leq C\norm{x-y}$ for some constant $C > 0$.
Computationally the presented proofs don't work well because averaging gives a pretty slow algorithm. Usually a certain convex combination $Q$ of the powers of $P$ yields a stochastic matrix with strictly positive entries (cf. exam) and the sequence $Q^{*n}\nu$ for any $\nu\in M_1$ converges (exponentially fast) to the invariant measure $\mu\in M_1$ of $Q$, which is also the invariant measure of $P$, provided it's unique.

Given a finite set $S$ and a transformation $\theta:S\rar S$. Prove that there is a $\theta$ invariant probability measure $\mu$ on $S$. Algebraically that means that there is a measure $\mu\in M_1$ such that for all $x\in S$: $\mu(\theta^{-1}(x))=\mu(x)$.

Stochastic matrices are frequently used in text analysis and text generation, cf. e.g. text generation. Here is a C code generating random text based on Molly's soliloquy in J. Joyce's Ulysses. This can also be used to get a certain graph structure of poems: e.g. Poe's poem Alone.

Suppose $\F_n$ is a filtration and $X_n:\O\rar S$ are measurable with respect to $\F_n$. Assume that there is some $m\in\N$ such that for all bounded measurable $f:S\rar\R$ and all $n$: $$ \E(f(X_{n+1})|\F_n) =\E(f(X_{n+1})|\s(X_n,\ldots,X_{n-m+1}))~. $$ Then $X_n$ is called an $m$-Markov chain. Verify that $Z_n\colon=(X_n,\ldots,X_{n-m+1})$ is a Markov chain with respect to $\F_n$ in $S^m$. Suggested solution.

This C code generates text by means of an $m$-Markov chain. Short texts such as Time will almost be reproduced (for $m=2$), whereas long texts such as Mann get jumbled (for $m\leq2$).

Given a finite stochastic matrix $P=(p(x,y))$. If $\mu$ is an invariant probability measure for $P$ and $S_0=\{x\in S:\mu(x)=0\}$ then for all $y\in S\sm S_0$ and all $x\in S_0$: $p(y,x)=0$. The Markov chain will never jump from a point in $S\sm S_0$ to any point in $S_0$.

A finite stochastic matrix $P=(p(x,y))$ has a unique invariant probability measure if and only if $\dim\ker(P^*-1)=1$. Solution by T. Speckhofer

We will see (cf. theorem) that if for all $x,y\in S$: $p(x,y) > 0$, then $\dim\ker(P^*-1)=1$ and the unique invariant probability measure $\mu$ is strictly positive, i.e. for all $x\in S$: $\mu(x) > 0$.

Suppose $P^*:\ell_1^n\rar\ell_1^n$ is linear (and diagonalizable) such that all eigenvalues $\l$ satisfy: $|\l|\leq1$. Prove that the sequence $$ A_n^*\colon=\frac1n(1+P^*\mu+\cdots+P^{*n-1}) $$ converges to the projection $Q$ onto the kernel of $P^*-1$ ($Q$ is defined by $Q|\ker(P^*-\l)=0$ for all eigenvalues $\l\neq1$ and $Q$ is the identity on $\ker(P^*-1)$). 2. If all eigenvalues $\l$ satisfy: $|\l| < 1$ or $\l=1$. Then the sequence $P^{*n}$ converges to $Q\mu$. 3. If there is some eigenvalue $\l\neq1$ satisfying $|\l|=1$, then the sequence $P^{*n}$ doesn't converge.

Determine all invariant probability measures of the following stochastic matrices: $$ \left(\begin{array}{ccc} 0&1&0\\ 0&.5&0.5\\ .5&0&.5 \end{array}\right),\quad \left(\begin{array}{ccc} .5&.3&.2\\ .2&.8&0\\ .3&.3&.4 \end{array}\right),\quad \left(\begin{array}{ccc} .6&.1&.3\\ .3&.6&.1\\ .1&.3&.6 \end{array}\right),\quad $$

Any two state ($S=\{1,2\}$) stochastic matrix is given by $$ P\colon=\left(\begin{array}{ccc} 1-a&a\\ b&1-b \end{array}\right),\quad a,b\in[0,1]~. $$ Suppose $P\neq1$, i.e. $a+b > 0$.

The invariant probability measure is given by $\mu(1)=a/(a+b)$, $\mu(2)=b/(a+b)$.
Compute all eigenvalues and eigenspaces of $P$.
Prove that for all $n\in\N$: $$ P^n=\frac1{a+b}\left(\begin{array}{ccc} a&a\\ b&b \end{array}\right) +\frac{(1-a-b)^n}{a+b}\left(\begin{array}{ccc} a&-b\\ -a&b \end{array}\right) $$
Give an example of a symmetric stochastic matrix $P$ such that $P^n$ does not converge!

Find the stochastic matrix describing the random walk on the vertices of the $3$-dimensional unit ball of $\ell_1^3$. What about $\ell_1^n$?