When a matrix is a table of probabilities
A stochastic (Markov) matrix P has nonnegative entries and columns that sum to 1: entry P[i,j] is the probability of moving from state j to state i in one step. If p is the current distribution over states, then P p is the distribution after one step, and P^t p after t steps. This is just matrix multiplication doing the bookkeeping of a random walk.
# Weather chain: states = {Sunny, Rainy}, columns sum to 1
# from Sunny from Rainy
P = [ [ 0.9 , 0.5 ], # to Sunny
[ 0.1 , 0.5 ] ] # to Rainy
# start certain it is Sunny: p0 = (1, 0)
# p1 = P p0 = (0.9, 0.1)
# p2 = P p1 = (0.86, 0.14)
# p_inf = (0.833..., 0.166...) <- it settles downThe steady state is an eigenvector
A distribution pi that does not change under one more step satisfies P pi = pi — it is an eigenvector of P with eigenvalue 1. This special eigenvector is the stationary distribution. Every stochastic matrix has eigenvalue 1 (the all-ones row vector is a left eigenvector), so a steady state always exists; the question is whether it is unique and whether you actually converge to it.
The Perron–Frobenius theorem settles it: if P has strictly positive entries (or more generally is irreducible and aperiodic), then eigenvalue 1 is simple and strictly dominates all others in magnitude, the stationary distribution is unique and strictly positive, and P^t p converges to it from ANY start.
PageRank: a random surfer on the web graph
Model the web as a graph: a surfer at a page picks an outgoing link uniformly at random. That defines a stochastic matrix from the adjacency matrix. Its stationary distribution ranks pages by how often the surfer lands on them — that is PageRank. Two fixes make Perron–Frobenius apply cleanly.
- Dangling-node fix: a page with no outgoing links teleports to a uniform random page.
- Damping: with probability 1 - d (usually d = 0.85) the surfer teleports anywhere, guaranteeing strictly positive entries.
- The result is a positive stochastic matrix G = d P + (1 - d)(1/n) 11^T with a unique stationary distribution.
Computing it: the power method
With billions of pages you never form G explicitly or diagonalize it. You just iterate: start from uniform, apply the map, repeat until it stops changing. Because eigenvalue 1 strictly dominates, this power method converges to the leading eigenvector — the stationary distribution — at a rate set by the damping factor.
# PageRank via the power method (no dense matrix needed)
r = ones(n) / n # uniform start
repeat:
r_new = d * (P @ r) + (1 - d) / n # apply G implicitly
if ||r_new - r||_1 < tol: break
r = r_new
return r # r is the stationary distribution