The Markov matrix and the exponential convergence

In this first tutorial, we study a core tool of Markov Chain Monte Carlo algorithm: the Markov matrix. It will allow you to
  • describe in a compact way the evolution of the probability configuration,
  • depict probability basic features (time evolution, conservation of probability),
  • understand why the convergence to the steady state is in general exponential, and to evaluate the typical convergence time.

The Markov-matrix approach is very general: not only it allows you to build the dynamical rules which sample a given probability distribution πC (as you have seen, this question was one of the main topic of the first lecture), but it also allows you to infer fundamental properties of a physical system only by knowing its dynamical rules.

A toy-example : the pebble game


We consider here a simple system: the 3x3 pebble game, where one particle moves on a 3x3-chessboard without periodic boundary conditions. The position on the chessboard are denoted 0,1,2,...,8, the four neighbors of the first site 0 are (1,3,0,0), and similarly for the other sites.
peppe_fig.png
For this simple toy-model we can explicitly write down the Markov matrix, which exactly solves the dynamics of the system. The conclusions that we can draw from this trivial example are however general and they can be extended to more complex and not exactly-solvable cases.

Markov matrix


If we start at time t=0 with the pebble at site 0, we know that at t=1 the pebble will be in sites 1 and 3 with probability 1/4 (which is the probability for the pebble to remain on site 0 at t=1?). In order to compute the exact probabilities at all time steps, it is convenient to introduce a vector




containing the probabilities of finding the pebble at time t in the site 0, 1, 2, ...8. The Monte Carlo algorithm is fully encoded in a
Markov transfer matrix T, whose element Tba gives the probability p(a->b) for the pebble to move from site a to site b.
  • What is the dimension of the matrix T in the 3x3 pebble game?

The following short code (similar to the one seen during Lecture 1) writes the Markov transfer matrix T for the 3x3 pebble game. Notice here that we have made use of the python package numpy, which (you will soon realize) is very comfortable and efficient for dealing with matrices, vectors and linear algebra in general.

import numpy
neighbor =  [[1, 3, 0, 0], [2, 4, 0, 1], [2, 5, 1, 2],
[4, 6, 3, 0], [5, 7, 3, 1], [5, 8, 4, 2],
[7, 6, 6, 3], [8, 7, 6, 4], [8, 8, 7, 5]]
transfer = numpy.zeros((9, 9))  # creates 9x9 matrix full of zeroes
for k in range(9):
    for neigh in range(4): transfer[neighbor[k][neigh], k] += 0.25
print transfer
Let's study it in detail and observe the matrix elements.
  1. Compute the sum of every column of T. What do you observe? What is your interpretation?
  2. How are then the probability configurations of the pebble system at time t=0, π(0), and t=1, π(1), related? Write π(t) as a function of the matrix T and of π(0) only.

The steady state of the pebble game: Eigenvalues of the Markov matrix


Let's apply recursively (100 times) the Markov matrix that we constructed above to the initial state where the pebble is in site 1, employing the following algorithm (pebble_transfer.py):

import numpy
 
neighbor =  [[1, 3, 0, 0], [2, 4, 0, 1], [2, 5, 1, 2],
[4, 6, 3, 0], [5, 7, 3, 1], [5, 8, 4, 2],
[7, 6, 6, 3], [8, 7, 6, 4], [8, 8, 7, 5]]
transfer = numpy.zeros((9, 9))
for k in range(9):
    for neigh in range(4): transfer[neighbor[k][neigh], k] += 0.25
position = numpy.zeros(9)
position[0] = 1.0
for t in range(100):
    print t,'  ',["%0.5f" % i for i in position]
    position = numpy.dot(transfer, position)

  • What do you observe about the final state? What does it mean in relation with the Markov matrix? Run the following modified program (pebble_transfer_eigen_pr.py)...

import numpy
 
neighbor =  [[1, 3, 0, 0], [2, 4, 0, 1], [2, 5, 1, 2],
[4, 6, 3, 0], [5, 7, 3, 1], [5, 8, 4, 2],
[7, 6, 6, 3], [8, 7, 6, 4], [8, 8, 7, 5]]
transfer = numpy.zeros((9, 9))
for k in range(9):
    for neigh in range(4): transfer[neighbor[k][neigh], k] += 0.25
eigenvalues, eigenvectors = numpy.linalg.eig(transfer)
#print eigenvalues___
for iter in range(9):
    print "eigenvalue"
    print eigenvalues[iter]
    print"eigenvector"
    print numpy.transpose(eigenvectors)[iter]
    print "=================="

  • Observe in details the eigenvalues and the eigenvectors. The maximum eigenvalue value is clearly 1. Give a physical interpretation of the corresponding eigenvector in terms of the Markov dynamics. Do you think that there may exist a situation for which the maximum eigenvalue of the Markov matrix is strictly larger than 1? Why?

Probability conservation and global/detailed balance


Consider a system described by a set of configurations {C}. C represents for example the position of a particle on a chessboard (as in the pebble game), or the full set of possible positions of every particle in a multi-particle system, or the set of every possible state of an Ising model. We would like to sample a configuration C with a given probability πC. A solution for this problem is the Markov Chain Monte Carlo method. We define a dynamics over the ensemble of configurations C such that, in the long time limit, each configuration is visited with the correct probability πC. As discussed in the lecture, a Markov chain describes the evolution which is fully specified by the transition probabilities pC→C1 between configurations C and C1.

πC(t) are the entries of the vector π(t), pC1→C the entries of the Markov matrix T , andπ(t+1)= T. π(t)
The probability πC(t) of being at time t in configuration C evolves according to the following recursion relation :


πC(t+1)= C1 πC1(t) pC1→C
πC(t) after some time finally converges towards a steady probability distribution π st, the steady state, which should verify
π stC = C1 π stC1 pC1C (for all C)
but thanks the the property of probability conservation (remember the sum of column-elements in the T matrix for the pebble game), this also rewrites
C1 π stC pCC1 = C1 π stC1 pC1C (for all C)
This is the global balance condition.

The detailed balance condition is more restrictive:
π stC pCC1 = π stC1 pC1C (for all C and C1)
and is a sufficient (more restrictive) condition for the global balance condition to be verified.

Probability conservation and eigenvalues of the Markov matrix



Optional

  1. Rewrite the stochastic property of the matrix T (concerning probability conservation and the sum of columns), which we have tested for the 3x3 pebble game, in a vector form : show that it is equivalent to 1.T = 1 , where 1 is the vector full of ones.
  2. Is the matrix T symmetric, in general? Why? This implies that, for an eigenvalue λ, one must distinguish the left eigenvectors (such that V.T = λ V ) and the right eigenvectors (such that T.V = λ V ).
  3. What does the equality 1.T = 1 mean for the vector 1 and the matrix T? What does it implies for the right eigenvectors of T? Use the mathematical fact that to any left eigenvector there exists a right eigenvector with the same eigenvalue. (To justify this: note that the spectrum of T and the spectrum of its transpose are the same, and that a left eigenvector of T is a right eigenvector of the transpose of T).
  4. Why then, according to the conservation of probability, no eigenvalues of T can be of modulus strictly larger than 1 ?

Let's now focus on the right eigenvector of T of eigenvalue λ1 = 1 and let's call it π1
  1. Write the eigenvalue equation explicitly, for the components of π1.
  2. Interpret it in terms of the global balance condition: what is the link between π1(C) and π st(C)?

To summarize : the global balance condition is a necessary condition for the probability vector π(t ) to converge to a steady distribution (i.e. for the Markov Monte Carlo algorithm to converge), but it is not sufficient (as we shall see at the end of this tutorial).

Exponential convergence


The Markov matrix provides us not only the steady state but a complete set of information on the dynamics of the system. In particular it can tell us how much time it is needed typically for the Markov algorithm to reach its steady state. This information is essential if we want to extract from the Monte Carlo simulation genuine information about observables of the system. We once again consider the pebble game to present a clear example. Apply iteratively the Markov matrix to the difference between the initial and the steady vectors π(t=0) – π st using the following code (pebble_transfer_sub_2.py)
import numpy, pylab
 
neighbor =  [[1, 3, 0, 0], [2, 4, 0, 1], [2, 5, 1, 2],
[4, 6, 3, 0], [5, 7, 3, 1], [5, 8, 4, 2],
[7, 6, 6, 3], [8, 7, 6, 4], [8, 8, 7, 5]]
transfer = numpy.zeros((9, 9))
for k in range(9):
    for neigh in range(4): transfer[neighbor[k][neigh], k] += 0.25
position = numpy.zeros(9)
position[0] = 1.0
list_T = []
list_av = []
list_fit= []
for t in range(100):
    list_T.append( t )
    list_av.append( abs(position[1]- 1.0 / 9.0) )
    list_fit.append( numpy.power(0.75,t) )
    print t,'  ',["%0.5f" % abs(i- 1.0 / 9.0) for i in position]
    position = numpy.dot(transfer, position)
pylab.semilogy(list_T, list_av, 'bo-', clip_on=False)
pylab.semilogy(list_T, list_fit, 'ro', clip_on=False)
pylab.annotate('(0.75)$^t$', xy=(10, 0.1), xytext=(10, 0.1))
pylab.title('dynamic evolution of the probability vector component' )
pylab.xlabel('$T$', fontsize=16)
pylab.ylabel('$\pi_i-\pi_{std}$', fontsize=16)
pylab.show()
#pylab.savefig('exp-decay.png' )
#pylab.clf()

You can observe that indeed all the components of the resulting vector go rapidly to zero, and that if we plot these values as a function of time on semi-log axis (un-comment the line pylab.show) the decay is exponential (i.e. is linear in the semi-log scale). Notice here that we have made use of the python package pylab, which allows to easily draw and display graphics. We can in fact show that the decay rate is well portrayed by the second largest eigenvalue (0.75)t, i.e., it decays exponentially as e-t, with τ = 1/ ln 0.75.
In order to understand why the second largest eigenvalue determines the time decaying rate, let's sort the other eigenvalues
λ2, λ3, ... of T by decreasing modulus 1 > 2| ≥ |λ3| ≥ ... and let's call π2, π3, ... the corresponding eigenvectors.

  1. Decompose the initial condition π(0) on the basis of eigenvectors.
  2. From this, determine π(t) as a function of the λ2, λ3, ... and the π1, π2, π3, ...
  3. Determine the relation between λ2 and the convergence time to the steady state.

Perron-Frobenius theorem


We have see above that the global balance condition is a necessary condition for the probability vector π(t) to converge to the desired distribution (i.e. for the Markov Monte Carlo algorithm to converge). But It is not a sufficient condition. The Perron-Frobenius theorems give the conditions for the unicity of the steady state and for the convergence to it of π(t).

Unicity

The dynamics is said to be irreducible if, equivalently, - No change of basis based on permutations can transform T into a matrix which is diagonal by blocks.- Or, more physically: every configuration can be reached from any other in a finite number of steps.
Then, if the matrix T is stochastic (i.e. it has entries ≥ 0 and the sum of its entries of every colum is equal to 1), there exists a unique right eigenvector π1 of eigenvalue equal to 1.

  1. Is the dynamics of the pebble game irreducible?
  2. Give a simple example of a Markov dynamics which is reducible (i.e. not irreducible), for example 2 disconnected pebble boards, and explain why a Markov Chain Monte Carlo would not always give the same result, depending on the initial state.
  3. Discuss the unicity of the maximum eigenvalue.

Aperiodicity

A second condition is required to ensure that the dynamics converges. It is called the aperiodicity. The dynamics is called periodic if there exists an initial state π(0) such that, after a finite number of state, strictly larger than 1, one obtains the same state. The minimal number of those time steps is called the period of the state. The second part of Perron-Frobenius theorem states that if the Markov matrix is aperiodic (i.e. if it has no periodic states), then the dynamics converges to the steady stateπ st.
  1. Consider a Markov chain with a periodic state π(0) of period p. What does it imply for the spectrum of the matrix T p ?
  2. What does this imply for the spectrum of the matrix T?
  3. Show that, in general, π(t) cannot converge to πst .

Example of periodic Markov chains

Consider the example of a particle jumping between two sites with probability 1 to the other site at each time step, explain why the dynamics is periodic, and find the eigenvectors and eigenvalues of the Markov matrix. Comment.
print this page