Hitting Times, Reversibility, and Continuous Time

How long until it arrives? Hitting times

Guide 4 answered "where does the chain spend its time in the long run?" with the stationary distribution. This guide answers a different and very human question: when does something happen? Fix a target set of states A — maybe a single absorbing state, maybe "the queue is empty". The hitting time T_A is the first time the chain lands in A: T_A = min{ n >= 0 : X_n is in A }. It is a random number, and the most useful summary of it is its mean, the [[mean-hitting-times|mean hitting time]] h_i = E[T_A given X_0 = i], the expected number of steps to reach A starting from state i.

The trick for computing h_i is the same one-step conditioning that powered the whole rung. If you are already in A, you have arrived, so h_i = 0. Otherwise you must take one step (costing 1) and then face the same problem from wherever you land. Condition on that first step using the transition matrix: h_i = 1 + sum over j of p_ij times h_j. This is a small linear system — one equation per state outside A — and solving it gives every mean hitting time at once. The matching idea for absorption probabilities (which target you hit, not how long it takes) sets up an almost identical system with the leading 1 removed.

Birth-death walk on {0,1,2,3,4}, 0 and 4 absorbing,
p = step right, q = 1-p step left (gambler's ruin).

  h_i = expected steps to hit {0 or 4} from i

  h_0 = 0
  h_1 = 1 + q*h_0 + p*h_2
  h_2 = 1 + q*h_1 + p*h_3
  h_3 = 1 + q*h_2 + p*h_4
  h_4 = 0

With p = q = 1/2:  h_1 = 3,  h_2 = 4,  h_3 = 3

One-step conditioning turns mean hitting times into a tiny linear system. The fair (p = 1/2) walk takes longest from the middle, exactly as your gut expects.

States like 0 and 4 above — once entered, never left, because p_ii = 1 — are [[absorbing-markov-chain|absorbing states]], and the chain is an absorbing chain when every state can eventually reach one. This is exactly the structure behind the [[gamblers-ruin|gambler's ruin]] problem you met in the random-walk rung: a gambler with i dollars walks toward 0 (ruin) or N (the bank), and the same equations give both the chance of each ending and the expected number of bets. Hitting times and absorption probabilities are how Markov chains answer questions about *finite* journeys, not just eternal averages.

Reversibility: when a chain looks the same backwards

Now a beautiful piece of structure that some chains have and some do not. Run a chain forward in its stationary distribution pi, film it, then play the film backwards. The reversed sequence is itself a Markov chain — but with different transition rules in general. A chain is reversible when the backwards film is statistically indistinguishable from the forwards one: the reversed chain has the very same transition probabilities. Reversibility is not automatic; it is a special symmetry, and it turns out to be both common in nature and enormously convenient.

The clean test for reversibility is the [[detailed-balance|detailed balance]] equations: a distribution pi and the matrix P are in detailed balance if pi(i) p_ij = pi(j) p_ji for every pair of states i, j. Read it as a flow statement: the long-run rate of probability flowing directly from i to j equals the rate flowing directly from j back to i — every individual two-way pipe is balanced. Contrast this with the stationarity condition pi P = pi, which only demands that the *total* inflow to each state equals its total outflow. Detailed balance is strictly stronger: it balances every pair, not just every node's books.

Why care? Two reasons make reversibility one of the most exploited ideas in applied probability. First, random walks on a graph (hop to a uniformly random neighbour) are always reversible, with pi(i) proportional to the degree of node i — a fact you can read straight off detailed balance. Second, and more spectacularly, the entire field of Markov chain Monte Carlo runs in reverse: the Metropolis-Hastings algorithm *engineers* a chain to satisfy detailed balance with respect to a target distribution you want to sample from, so that the chain's stationary distribution is exactly that target. Reversibility is not a curiosity here; it is the design principle.

Letting the clock run continuously

Everything so far assumed a clock that ticks in whole steps: n = 0, 1, 2, .... But many real systems do not wait for a tick. A radioactive atom decays at some random instant; a customer arrives whenever they arrive; a phone call ends after an unpredictable duration. To model these we let the chain jump at random *real-valued* times, giving a [[continuous-time-markov-chain|continuous-time Markov chain]]. The state space is still discrete (still a list of states), but time is now a continuous line.

Here is the one new ingredient, and it is forced on us by the Markov property itself. How long does the chain wait in a state before jumping? The waiting time must be memoryless — if the future depends only on the present state, it cannot also depend on how long you have already been sitting there. There is exactly one continuous distribution with that property: the exponential distribution (its memorylessness is the continuous twin of the geometric's). So each state i holds the chain for an Exponential waiting time, then it jumps elsewhere. Each state has a [[rate-intensity-parameter|rate]] parameter; a higher rate means a shorter expected stay before the next jump.

So a continuous-time chain decomposes into two pieces you already understand: an exponential clock that decides *when* to jump, and an ordinary discrete jump rule that decides *where* to go. The bookkeeping device that replaces the transition matrix is the generator matrix Q: its off-diagonal entry q_ij (>= 0) is the rate of jumping directly from i to j, and each diagonal entry is set to minus the row sum so that every row of Q sums to 0. Where discrete chains used powers P^n, continuous chains use the matrix exponential: the distribution at time t evolves as exp(Qt), the smooth continuous-time analogue of repeated multiplication.

Familiar friends in continuous time

The simplest continuous-time chain only ever counts upward: 0 -> 1 -> 2 -> ..., each jump waiting an independent Exponential(lambda) time. That is the [[poisson-process|Poisson process]] you may have glimpsed earlier — the canonical model of "events arriving at random at a steady average rate lambda", from photons hitting a detector to emails landing in an inbox. Its interarrival times are exactly those memoryless exponential waits, which is why the Poisson process and the exponential distribution are inseparable.

Allow the count to go down as well as up — births at one rate, deaths at another — and you get a [[birth-death-process|birth-death process]], the continuous-time cousin of the random walk on the integers. This single template models population sizes, the number of molecules in a reaction, and most famously the length of a waiting line. A queue with one server, Poisson arrivals, and exponential service times is the M/M/1 queue, whose stationary distribution and average wait fall straight out of the birth-death equations. Notice the payoff of the previous section: birth-death chains satisfy detailed balance automatically (you can only get from state n to n+2 by passing through n+1), so their stationary distribution drops out of pairwise balance with no heavy algebra.

Step back and see the arc of the whole rung. You began with one assumption — the future depends only on the present — and built the transition matrix; you multiplied it to predict (Chapman-Kolmogorov), dissected its structure to classify states, raised it to a power to find the long-run stationary distribution, and now you can time the journey (hitting times), spot a hidden symmetry (reversibility), and let the clock run smooth (continuous time). The same skeleton extends far beyond this rung: into queueing theory, population genetics, the algorithms that train modern statistical models, and the continuous-state Brownian motion waiting at the top of this ladder.