The Stationary Distribution and Long-Run Behavior

A distribution that the chain leaves unchanged

By now we can push a chain forward one step at a time. If today's state is random with row vector mu (so mu_i = P(today in state i)), then tomorrow's distribution is mu P, where P is the transition matrix from the first guide in this rung. Multiply by P again and you get the day after, and so on. Most starting distributions keep shuffling: the probabilities slosh between states as the days pass. The question of this guide is whether there is a special distribution that does *not* slosh — one that the chain leaves exactly where it is.

Such a distribution is called a stationary distribution, written pi. It is a probability row vector (nonnegative entries summing to 1) that satisfies the balance equation pi P = pi. Read it slowly: if today's distribution over states is pi, then tomorrow's distribution mu P = pi P = pi is the *same* pi. Start the chain off in pi and, in the sense of its distribution, it never ages — every day looks statistically identical to the last, even though the actual state keeps jumping around. The equation pi P = pi says pi is a left eigenvector of P with eigenvalue 1.

Solving pi P = pi: a tiny worked example

Let us actually find pi for a two-state weather chain. States are Sunny (S) and Rainy (R). From a sunny day, tomorrow is sunny with probability 0.8 and rainy with probability 0.2; from a rainy day, sunny with probability 0.4 and rainy with 0.6. Write pi = (pi_S, pi_R). The balance equation pi P = pi gives two equations, but the rows of P each sum to 1 so the two equations are redundant — you always need the extra normalising condition pi_S + pi_R = 1 to pin pi down. That is a general feature: pi P = pi alone never fixes the scale, so always add "the entries sum to 1".

P =  [ 0.8  0.2 ]      pi P = pi  with  pi = (pi_S, pi_R)
     [ 0.4  0.6 ]

Column S:  0.8*pi_S + 0.4*pi_R = pi_S   ->   0.4*pi_R = 0.2*pi_S
                                        ->   pi_S = 2 * pi_R
Normalise: pi_S + pi_R = 1
           2*pi_R + pi_R = 1   ->   pi_R = 1/3,  pi_S = 2/3

   pi = (2/3, 1/3)

Check:  pi P = (2/3*0.8 + 1/3*0.4 ,  2/3*0.2 + 1/3*0.6)
             = (0.8/1.5+... ) = (2/3, 1/3)  = pi   OK

Solving the balance equation plus normalisation for the two-state weather chain: pi = (2/3, 1/3).

So the chain spends, in the long run, about two-thirds of its days sunny and one-third rainy, regardless of whether we started the calendar on a sunny or a rainy day. There is a vivid way to read pi P = pi for each state, called the balance picture: in equilibrium the probability flowing *out* of a state each step equals the probability flowing *in*. Out of S flows 0.2*pi_S; into S flows 0.4*pi_R; setting them equal gives 0.2*(2/3) = 0.4*(1/3), and indeed both equal 2/15. Stationarity is the bookkeeping condition that every state's inflow balances its outflow.

When does the chain actually converge to pi?

Having a stationary pi is one thing; *converging* to it from an arbitrary start is another, stronger thing. The hoped-for statement is that mu P^n -> pi as n grows, for every starting distribution mu — the chain forgets where it began. The distribution that the n-step transitions approach is called the limiting distribution. When it exists it must equal pi, but it does not always exist. Two failures from the previous guide on classifying states can block it: reducibility and periodicity.

First, the chain must be irreducible — every state reachable from every other — so there is a *single* equilibrium rather than separate equilibria trapped in different communicating classes. Second, it must be aperiodic: no rigid drumbeat that returns to a state only on multiples of some period d > 1. The classic culprit is the deterministic 2-cycle P = [[0,1],[1,0]]: it has the perfectly good stationary pi = (1/2, 1/2), yet starting at state 1 the distribution flips 1, 0, 1, 0, ... forever and never settles. Its time-average is 1/2 each, but the snapshot distribution never converges. Periodicity breaks limits, not stationarity.

Put the good cases together and you get the convergence theorem for finite chains: if a chain is irreducible and aperiodic (such a chain is often called ergodic), then it has a unique stationary distribution pi, and from *any* start mu P^n -> pi. The same holds for infinite chains with the extra requirement that the chain be positive recurrent — that expected return times are finite — which on a finite irreducible chain is automatic. In that good case pi_i is also exactly 1/(expected return time to i), tying the equilibrium fraction of time in i to how long, on average, the chain takes to come home to i.

Two long-run statements that are easy to confuse

There are two genuinely different long-run claims, and keeping them apart is the single most useful habit in this subject. The first is the limiting statement: P(X_n = i) -> pi_i, i.e. the *snapshot* probability of being in state i settles down. This one needs aperiodicity, as the flip-flopping 2-cycle showed. The second is the time-average statement: the *fraction of the first n steps* spent in state i converges to pi_i. This is the ergodic theorem for Markov chains, and remarkably it only needs irreducibility (plus positive recurrence) — periodicity does not hurt it.

The 2-cycle makes the difference unforgettable. Its snapshot distribution never converges (it forever cycles 1,0,1,0,...), yet the fraction of time spent in each state marches steadily to 1/2 — count the visits and divide by n. So you can have a perfectly good long-run *average* even when the moment-by-moment *distribution* refuses to settle. This is the Markov-chain cousin of a theme from the limit-theorem rung: a time average can behave beautifully even when individual snapshots do not, just as the law of large numbers is about the average rather than any single term.

How to find and use the long-run behavior

In practice you almost always want pi, and the recipe is short. The work is solving a linear system, with one honest catch: the balance equations are rank-deficient by exactly one, so the normalising row is not optional decoration — it is what makes the answer unique. For a discrete-time chain on a handful of states this is a few lines of algebra; on many states it is a small computation that any linear-algebra routine handles.

Check the chain is irreducible (every state reaches every other). If not, the long-run behavior may depend on the start, and there can be several stationary distributions — one per closed communicating class.
Write the balance equations pi P = pi (one per state, saying inflow = outflow), drop one redundant equation, and add the normaliser sum of pi_i = 1.
Solve the linear system for pi. The normalisation guarantees a single nonnegative probability vector when the chain is irreducible.
Check aperiodicity before claiming the snapshot limit P(X_n = i) -> pi_i. Without it, you may only claim the time-average reading: the long-run fraction of steps in state i equals pi_i.

Why does any of this matter? Because so much applied probability is really a quiet question about equilibrium. The fraction of time a server is busy, the long-run share of customers in each tier, the eventual distribution of a shuffled deck, even the famous ranking of web pages, are all stationary distributions of a suitable chain. Once you can write down P and solve pi P = pi, you can answer "what happens in the long run?" without simulating a single step. The next guide sharpens the picture from the other end — asking not where the chain ends up, but how long it takes to get somewhere (hitting times), when balance becomes the stronger condition of reversibility, and how all of this carries over to chains that move in continuous time.