An Example of Statistical Investigation of the Text Eugene Onegin Concerning the Connection of Samples in Chains
Even linked, dependent events settle into a stable long-run pattern — the first Markov chain.
To win an argument about probability, a Russian mathematician counted the vowels and consonants in a beloved poem — and invented one of the most useful ideas in modern science.
The idea, unpacked
Classical probability was built for independent events — coin flips, where each toss forgets the last. Markov asked what happens when events are linked, so that what comes next leans on what just happened. He called such a sequence a chain, with one rule: the future depends only on the present state, not on the whole history.
His surprising result: even when the steps are tangled together this way, the long run is still orderly. The fraction of time the chain spends in each state settles down to a fixed set of numbers — the stationary distribution — no matter where you start. Dependence does not mean chaos.
Where it came from
In the early 1900s Markov was locked in a feud with a fellow mathematician, Pavel Nekrasov, who insisted that the famous law of large numbers — the reason averages are reliable — only worked when events were independent, and even hinted this had something to do with free will. Markov set out to prove him wrong.
He needed a real example of dependent events, and he reached for literature: the first 20,000 letters of Pushkin's Eugene Onegin. He sorted every letter into just two kinds, vowel or consonant, and counted how often each kind followed the other. Vowels rarely follow vowels; consonants are often followed by vowels. The letters were clearly dependent — and yet the overall fraction of vowels was perfectly stable, just as his theory predicted.
Why it mattered
Markov had shown that statistical regularity does not need independence. That freed probability to describe the real world, where almost nothing is truly independent — weather follows weather, words follow words, today's stock price leans on yesterday's. The chain he built to win an argument turned out to be the right tool for an astonishing range of problems, and it carries his name to this day.
A way to picture it
Think of weather as two states, sunny and rainy, where tomorrow leans on today: a sunny day is usually followed by another sunny day, rain often breaks a sunny streak. Track it for a year and a strange order appears — the share of sunny days settles to a fixed percentage that depends only on those follow-on odds, not on whether the year happened to start sunny. That settled percentage is the stationary distribution. Try it with vowels and consonants below.
Where it sits
Markov chains gave probability a way to handle linked events, picking up where the law of large numbers (the legacy of Bayes and the Bernoullis, see bayes-1763) left off. A generation later they became the backbone of information theory — Shannon modelled language itself as a Markov chain (see shannon-1948) — and today they live inside everything from Google's original search ranking to the AI that predicts your next word (see transformer-2017).