In this first tutorial, we study a core tool of Markov Chain Monte Carlo algorithm: the Markov matrix. It will allow you to

describe in a compact way the evolution of the probability configuration,

depict probability basic features (time evolution, conservation of probability),

understand why the convergence to the steady state is in general exponential, and to evaluate the typical convergence time.

The Markov-matrix approach is very general: not only it allows you to build the dynamical rules which sample a given probability distribution πC (as you have seen, this question was one of the main topic of the first lecture), but it also allows you to infer fundamental properties of a physical system only by knowing its dynamical rules.

A toy-example : the pebble game

We consider here a simple system: the 3x3 pebble game, where one particle moves on a 3x3-chessboard without periodic boundary conditions. The position on the chessboard are denoted 0,1,2,...,8, the four neighbors of the first site 0 are (1,3,0,0), and similarly for the other sites.
For this simple toy-model we can explicitly write down the Markov matrix, which exactly solves the dynamics of the system. The conclusions that we can draw from this trivial example are however general and they can be extended to more complex and not exactly-solvable cases.

Markov matrix

If we start at time t=0 with the pebble at site 0, we know that at t=1 the pebble will be in sites 1 and 3 with probability 1/4 (which is the probability for the pebble to remain on site 0 at t=1?). In order to compute the exact probabilities at all time steps, it is convenient to introduce a vector

containing the probabilities of finding the pebble at time t in the site 0, 1, 2, ...8. The Monte Carlo algorithm is fully encoded in a Markov transfer matrixT, whose element Tba gives the probability p(a->b) for the pebble to move from site a to site b.

What is the dimension of the matrix T in the 3x3 pebble game?

The following short code (similar to the one seen during Lecture 1) writes the Markov transfer matrix T for the 3x3 pebble game. Notice here that we have made use of the python package numpy, which (you will soon realize) is very comfortable and efficient for dealing with matrices, vectors and linear algebra in general.

import numpy
neighbor =[[1,3,0,0],[2,4,0,1],[2,5,1,2],[4,6,3,0],[5,7,3,1],[5,8,4,2],[7,6,6,3],[8,7,6,4],[8,8,7,5]]
transfer = numpy.zeros((9,9))# creates 9x9 matrix full of zeroesfor k inrange(9):
for neigh inrange(4): transfer[neighbor[k][neigh], k] +=0.25print transfer

Let's study it in detail and observe the matrix elements.

Compute the sum of every column of T. What do you observe? What is your interpretation?

How are then the probability configurations of the pebble system at time t=0, π(0), and t=1, π(1), related? Write π(t) as a function of the matrix T and of π(0) only.

The steady state of the pebble game: Eigenvalues of the Markov matrix

Let's apply recursively (100 times) the Markov matrix that we constructed above to the initial state where the pebble is in site 1, employing the following algorithm (pebble_transfer.py):

import numpy
neighbor =[[1,3,0,0],[2,4,0,1],[2,5,1,2],[4,6,3,0],[5,7,3,1],[5,8,4,2],[7,6,6,3],[8,7,6,4],[8,8,7,5]]
transfer = numpy.zeros((9,9))for k inrange(9):
for neigh inrange(4): transfer[neighbor[k][neigh], k] +=0.25
position = numpy.zeros(9)
position[0]=1.0for t inrange(100):
print t,' ',["%0.5f" % i for i in position]
position = numpy.dot(transfer, position)

What do you observe about the final state? What does it mean in relation with the Markov matrix? Run the following modified program (pebble_transfer_eigen_pr.py)...

import numpy
neighbor =[[1,3,0,0],[2,4,0,1],[2,5,1,2],[4,6,3,0],[5,7,3,1],[5,8,4,2],[7,6,6,3],[8,7,6,4],[8,8,7,5]]
transfer = numpy.zeros((9,9))for k inrange(9):
for neigh inrange(4): transfer[neighbor[k][neigh], k] +=0.25
eigenvalues, eigenvectors = numpy.linalg.eig(transfer)#print eigenvalues___foriterinrange(9):
print"eigenvalue"print eigenvalues[iter]print"eigenvector"print numpy.transpose(eigenvectors)[iter]print"=================="

Observe in details the eigenvalues and the eigenvectors. The maximum eigenvalue value is clearly 1. Give a physical interpretation of the corresponding eigenvector in terms of the Markov dynamics. Do you think that there may exist a situation for which the maximum eigenvalue of the Markov matrix is strictly larger than 1? Why?

Probability conservation and global/detailed balance

Consider a system described by a set of configurations {C}. C represents for example the position of a particle on a chessboard (as in the pebble game), or the full set of possible positions of every particle in a multi-particle system, or the set of every possible state of an Ising model. We would like to sample a configuration C with a given probability πC. A solution for this problem is the Markov Chain Monte Carlo method. We define a dynamics over the ensemble of configurations C such that, in the long time limit, each configuration is visited with the correct probability πC. As discussed in the lecture, a Markov chain describes the evolution which is fully specified by the transition probabilities pC→C1between configurations C and C1. πC(t) are the entries of the vector π(t), pC1→C the entries of the Markov matrix T , andπ(t+1)= T. π(t) The probability πC(t) of being at time t in configuration C evolves according to the following recursion relation : πC(t+1)= ∑C1 πC1(t) pC1→C πC(t) after some time finally converges towards a steady probability distribution π st, the steady state, which should verify π stC = ∑C1π stC1 pC1→C (for all C)
but thanks the the property of probability conservation (remember the sum of column-elements in the T matrix for the pebble game), this also rewrites ∑C1 π stC pC→C1 = ∑C1 π stC1 pC1→C (for all C)
This is the global balance condition.

The detailed balance condition is more restrictive: π stC pC→C1 = π stC1 pC1→C (for all C and C1)
and is a sufficient (more restrictive) condition for the global balance condition to be verified.

Probability conservation and eigenvalues of the Markov matrix

Optional

Rewrite the stochastic property of the matrix T (concerning probability conservation and the sum of columns), which we have tested for the 3x3 pebble game, in a vector form : show that it is equivalent to 1.T = 1 , where 1 is the vector full of ones.

Is the matrix T symmetric, in general? Why? This implies that, for an eigenvalue λ, one must distinguish the left eigenvectors (such that V.T = λV ) and the right eigenvectors (such that T.V = λV ).

What does the equality 1.T = 1 mean for the vector 1 and the matrix T? What does it implies for the right eigenvectors of T? Use the mathematical fact that to any left eigenvector there exists a right eigenvector with the same eigenvalue. (To justify this: note that the spectrum of T and the spectrum of its transpose are the same, and that a left eigenvector of T is a right eigenvector of the transpose of T).

Why then, according to the conservation of probability, no eigenvalues of T can be of modulus strictly larger than 1 ?

Let's now focus on the right eigenvector of T of eigenvalue λ1 = 1 and let's call it π1

Write the eigenvalue equation explicitly, for the components of π1.

Interpret it in terms of the global balance condition: what is the link between π1(C) and π st(C)?

To summarize :the global balance condition is a necessary condition for the probability vector π(t ) to converge to a steady distribution (i.e. for the Markov Monte Carlo algorithm to converge), but it is not sufficient (as we shall see at the end of this tutorial).

Exponential convergence

The Markov matrix provides us not only the steady state but a complete set of information on the dynamics of the system. In particular it can tell us how much time it is needed typically for the Markov algorithm to reach its steady state. This information is essential if we want to extract from the Monte Carlo simulation genuine information about observables of the system. We once again consider the pebble game to present a clear example. Apply iteratively the Markov matrix to the difference between the initial and the steady vectors π(t=0) – π st using the following code (pebble_transfer_sub_2.py)

import numpy, pylab
neighbor =[[1,3,0,0],[2,4,0,1],[2,5,1,2],[4,6,3,0],[5,7,3,1],[5,8,4,2],[7,6,6,3],[8,7,6,4],[8,8,7,5]]
transfer = numpy.zeros((9,9))for k inrange(9):
for neigh inrange(4): transfer[neighbor[k][neigh], k] +=0.25
position = numpy.zeros(9)
position[0]=1.0
list_T =[]
list_av =[]
list_fit=[]for t inrange(100):
list_T.append( t )
list_av.append(abs(position[1]- 1.0 / 9.0))
list_fit.append( numpy.power(0.75,t))print t,' ',["%0.5f" % abs(i- 1.0 / 9.0)for i in position]
position = numpy.dot(transfer, position)
pylab.semilogy(list_T, list_av,'bo-', clip_on=False)
pylab.semilogy(list_T, list_fit,'ro', clip_on=False)
pylab.annotate('(0.75)$^t$', xy=(10,0.1), xytext=(10,0.1))
pylab.title('dynamic evolution of the probability vector component')
pylab.xlabel('$T$', fontsize=16)
pylab.ylabel('$\pi_i-\pi_{std}$', fontsize=16)
pylab.show()#pylab.savefig('exp-decay.png' )#pylab.clf()

You can observe that indeed all the components of the resulting vector go rapidly to zero, and that if we plot these values as a function of time on semi-log axis (un-comment the line pylab.show) the decay is exponential (i.e. is linear in the semi-log scale). Notice here that we have made use of the python package pylab, which allows to easily draw and display graphics. We can in fact show that the decay rate is well portrayed by the second largest eigenvalue (0.75)t, i.e., it decays exponentially as e-t/τ, with τ = 1/ ln 0.75.
In order to understand why the second largest eigenvalue determines the time decaying rate, let's sort the other eigenvalues λ2, λ3, ... of T by decreasing modulus 1 > |λ2| ≥ |λ3| ≥ ... and let's call π2, π3, ... the corresponding eigenvectors.

Decompose the initial condition π(0) on the basis of eigenvectors.

From this, determine π(t) as a function of the λ2, λ3, ... and the π1, π2, π3, ...

Determine the relation betweenλ2 and the convergence time to the steady state.

Perron-Frobenius theorem

We have see above that the global balance condition is a necessary condition for the probability vector π(t) to converge to the desired distribution (i.e. for the Markov Monte Carlo algorithm to converge). But It is not a sufficient condition. The Perron-Frobenius theorems give the conditions for the unicity of the steady state and for the convergence to it of π(t).

Unicity

The dynamics is said to be irreducible if, equivalently, - No change of basis based on permutations can transform T into a matrix which is diagonal by blocks.- Or, more physically: every configuration can be reached from any other in a finite number of steps. Then, if the matrix T is stochastic (i.e. it has entries ≥ 0 and the sum of its entries of every colum is equal to 1), there exists a unique right eigenvector π1 of eigenvalue equal to 1.

Is the dynamics of the pebble game irreducible?

Give a simple example of a Markov dynamics which is reducible (i.e. not irreducible), for example 2 disconnected pebble boards, and explain why a Markov Chain Monte Carlo would not always give the same result, depending on the initial state.

Discuss the unicity of the maximum eigenvalue.

Aperiodicity

A second condition is required to ensure that the dynamics converges. It is called the aperiodicity. The dynamics is called periodic if there exists an initial state π(0) such that, after a finite number of state, strictly larger than 1, one obtains the same state. The minimal number of those time steps is called the period of the state. The second part of Perron-Frobenius theorem states that if the Markov matrix is aperiodic (i.e. if it has no periodic states), then the dynamics converges to the steady stateπ st.

Consider a Markov chain with a periodic state π(0) of period p. What does it imply for the spectrum of the matrix T p ?

What does this imply for the spectrum of the matrix T?

Show that, in general, π(t) cannot converge to πst .

Example of periodic Markov chains

Consider the example of a particle jumping between two sites with probability 1 to the other site at each time step, explain why the dynamics is periodic, and find the eigenvectors and eigenvalues of the Markov matrix. Comment. print this page

The Markov matrix and the exponential convergenceIn this first tutorial, we study a core tool of Markov Chain Monte Carlo algorithm: the Markov matrix. It will allow you to

probability configuration,convergence to the steady stateis in general exponential, and to evaluate the typical convergence time.The Markov-matrix approach is very general: not only it allows you to build the dynamical rules which sample a given probability distribution

πC(as you have seen, this question was one of the main topic of the first lecture), but it also allows you to infer fundamental properties of a physical system only by knowing its dynamical rules.## A toy-example : the pebble game

We consider here a simple system: the 3x3 pebble game, where one particle moves on a 3x3-chessboard

withoutperiodic boundary conditions. The position on the chessboard are denoted 0,1,2,...,8, the four neighbors of the first site 0 are (1,3,0,0), and similarly for the other sites.For this simple toy-model we can explicitly write down the Markov matrix, which exactly solves the dynamics of the system. The conclusions that we can draw from this trivial example are however general and they can be extended to more complex and not exactly-solvable cases.

## Markov matrix

If we start at time t=0 with the pebble at site 0, we know that at t=1 the pebble will be in sites 1 and 3 with probability 1/4 (which is the probability for the pebble to remain on site 0 at t=1?). In order to compute the exact probabilities at all time steps, it is convenient to introduce a vector

containing the probabilities of finding the pebble at time t in the site 0, 1, 2, ...8. The Monte Carlo algorithm is fully encoded in a

Markov transfer matrix, whose elementTTbagives the probabilityp(a->b) for the pebble to move from siteato siteb.in the 3x3 pebble game?TThe following short code (similar to the one seen during Lecture 1) writes the Markov transfer matrix

for the 3x3 pebble game. Notice here that we have made use of the python packageTnumpy, which (you will soon realize) is very comfortable and efficient for dealing with matrices, vectors and linear algebra in general.Let's study it in detail and observe the matrix elements.

. What do you observe? What is your interpretation?Tπ(0), and t=1,π(1), related? Writeπ(t) as a function of the matrixand ofTπ(0) only.## The steady state of the pebble game: Eigenvalues of the Markov matrix

Let's apply recursively (100 times) the Markov matrix that we constructed above to the initial state where the pebble is in site 1, employing the following algorithm (pebble_transfer.py):

## Probability conservation and global/detailed balance

Consider a system described by a set of configurations {

C}. Crepresents for example the position of a particle on a chessboard (as in the pebble game), or the full set of possible positions of every particle in a multi-particle system, or the set of every possible state of an Ising model. We would like to sample a configurationCwith a given probabilityπC. A solution for this problem is the Markov Chain Monte Carlo method. We define a dynamics over the ensemble of configurationsCsuch that, in the long time limit, each configuration is visited with the correct probabilityπC. As discussed in the lecture, a Markov chain describes the evolution which is fully specified by the transition probabilities pC→C1between configurationsCandC1.πC(t) are the entries of the vectorπ(t), pC1→C the entries of the Markov matrix, andTπ(t+1)=(t)T. πThe probability

πC(t) of being at timetin configurationCevolves according to the following recursion relation :πC(t+1)= ∑C1πC1(t) pC1→CπC(t) after some time finally converges towards a steady probability distributionπst, thesteady state, which should verifyπstC= ∑C1πstC1pC1→C(for allC)but thanks the the property of probability conservation (remember the sum of column-elements in the

matrix for the pebble game), this also rewritesT∑

C1πstCpC→C1= ∑C1πstC1pC1→C(for allC)This is the

global balancecondition.The

detailed balancecondition is more restrictive:πstCpC→C1=πstC1pC1→C(for all)CandC1and is a sufficient (more restrictive) condition for the global balance condition to be verified.

## Probability conservation and eigenvalues of the Markov matrix

Optionalstochastic propertyof the matrix(concerning probability conservation and the sum of columns), which we have tested for the 3x3 pebble game, in a vector form : show that it is equivalent toT1.=T1, where1is the vector full of ones.symmetric, in general? Why? This implies that, for an eigenvalueTλ, one must distinguish thelefteigenvectors (such thatV.=TλV) and therighteigenvectors (such that=T.VλV).1.=T1mean for the vector1and the matrix? What does it implies for theTrighteigenvectors of? Use the mathematical fact that to any left eigenvector there exists a right eigenvector with the same eigenvalue. (To justify this: note that the spectrum ofTand the spectrum of its transpose are the same, and that a left eigenvector ofTis a right eigenvector of the transpose ofT).Tcan be of modulus strictly larger than 1 ?TLet's now focus on the right eigenvector of

of eigenvalueTλ1 = 1 and let's call itπ1π1.π1(C) andπst(C)?To summarize :the global balance condition is a necessary condition for the probability vector π(t)to a steady distribution (to convergei.e.for the Markov Monte Carlo algorithm to converge),(as we shall see at the end of this tutorial).but it is not sufficient## Exponential convergence

The Markov matrix provides us not only the steady state but a complete set of information on the dynamics of the system. In particular it can tell us

how much timeit is needed typically for the Markov algorithm to reach its steady state. This information is essential if we want to extract from the Monte Carlo simulation genuine information about observables of the system. We once again consider the pebble game to present a clear example. Apply iteratively the Markov matrix to the difference between the initial and the steady vectorsπ(t=0) –πst using the following code (pebble_transfer_sub_2.py)You can observe that indeed all the components of the resulting vector go rapidly to zero, and that if we plot these values as a function of time on semi-log axis (un-comment the line pylab.show) the decay is exponential (

i.e.is linear in the semi-log scale). Notice here that we have made use of the python packagepylab, which allows to easily draw and display graphics. We can in fact show that the decay rate is well portrayed by the second largest eigenvalue (0.75)t,i.e., it decays exponentially as e-t/τ, with τ = 1/ ln 0.75.In order to understand why the second largest eigenvalue determines the time decaying rate, let's sort the other eigenvalues

λ2,λ3,...ofby decreasing modulus 1 >T|λ2| ≥ |λ3| ≥ ...and let's callπ2,π3, ... the corresponding eigenvectors.π(0) on the basis of eigenvectors.π(t) as a function of theλ2,λ3, ... and theπ1,π2,π3, ...λ2 and the convergence time to the steady state.## Perron-Frobenius theorem

We have see above that

for the probability vectorthe global balance condition is a necessary conditionπ(t) to converge to the desired distribution (i.e.for the Markov Monte Carlo algorithm to converge).condition. The Perron-Frobenius theorems give the conditions for the unicity of the steady state and for the convergence to it ofBut It is not a sufficientπ(t).## Unicity

The dynamics is said to beirreducibleif, equivalently, - No change of basis based on permutations can transforminto a matrix which is diagonal by blocks.- Or, more physically: every configuration can be reached from any other in a finite number of steps.TThen, if the matrix

is stochastic (Ti.e.it has entries ≥ 0 and the sum of its entries of every colum is equal to 1), there exists auniqueright eigenvectorπ1 of eigenvalue equal to 1.reducible(i.e.not irreducible), for example 2 disconnected pebble boards, and explain why a Markov Chain Monte Carlo would not always give the same result, depending on the initial state.## Aperiodicity

A second condition is required to ensure that the dynamics converges. It is called theaperiodicity. The dynamics is calledperiodicif there exists an initial stateπ(0) such that, after a finite number of state, strictly larger than 1, one obtains the same state. The minimal number of those time steps is called theperiodof the state. The second part of Perron-Frobenius theorem states that if the Markov matrix is aperiodic (i.e.if it has no periodic states), then the dynamics converges to thesteady stateπst.π(0) of periodp. What does it imply for the spectrum of the matrix?Tp?Tπ(t) cannot converge toπst .## Example of periodic Markov chains

Consider the example of a particle jumping between two sites with probability 1 to the other site at each time step, explain why the dynamics is periodic, and find the eigenvectors and eigenvalues of the Markov matrix. Comment.print this page