JOVANA
Library Glossary Getting Started Three Levels Fields How it works Mission
Join the mission
Back to the library
Mathematics 1913

An Example of Statistical Investigation of the Text Eugene Onegin Concerning the Connection of Samples in Chains

Andrey Andreyevich Markov

Even linked, dependent events settle into a stable long-run pattern — the first Markov chain.

Choose your version
In depth · the introduction

To win an argument about probability, a Russian mathematician counted the vowels and consonants in a beloved poem — and invented one of the most useful ideas in modern science.

The idea, unpacked

Classical probability was built for independent events — coin flips, where each toss forgets the last. Markov asked what happens when events are linked, so that what comes next leans on what just happened. He called such a sequence a chain, with one rule: the future depends only on the present state, not on the whole history.

His surprising result: even when the steps are tangled together this way, the long run is still orderly. The fraction of time the chain spends in each state settles down to a fixed set of numbers — the stationary distribution — no matter where you start. Dependence does not mean chaos.

Where it came from

In the early 1900s Markov was locked in a feud with a fellow mathematician, Pavel Nekrasov, who insisted that the famous law of large numbers — the reason averages are reliable — only worked when events were independent, and even hinted this had something to do with free will. Markov set out to prove him wrong.

He needed a real example of dependent events, and he reached for literature: the first 20,000 letters of Pushkin's Eugene Onegin. He sorted every letter into just two kinds, vowel or consonant, and counted how often each kind followed the other. Vowels rarely follow vowels; consonants are often followed by vowels. The letters were clearly dependent — and yet the overall fraction of vowels was perfectly stable, just as his theory predicted.

Why it mattered

Markov had shown that statistical regularity does not need independence. That freed probability to describe the real world, where almost nothing is truly independent — weather follows weather, words follow words, today's stock price leans on yesterday's. The chain he built to win an argument turned out to be the right tool for an astonishing range of problems, and it carries his name to this day.

A way to picture it

Think of weather as two states, sunny and rainy, where tomorrow leans on today: a sunny day is usually followed by another sunny day, rain often breaks a sunny streak. Track it for a year and a strange order appears — the share of sunny days settles to a fixed percentage that depends only on those follow-on odds, not on whether the year happened to start sunny. That settled percentage is the stationary distribution. Try it with vowels and consonants below.

An interactive plot showing the chance the current letter is a vowel as you step along the text. Two curves begin at opposite extremes — one from all-vowels, one from all-consonants — and both settle to the same dashed line near 0.43. Two sliders let you change how often a vowel follows a vowel, and how often a vowel follows a consonant; the settling line moves as you do.

Where it sits

Markov chains gave probability a way to handle linked events, picking up where the law of large numbers (the legacy of Bayes and the Bernoullis, see bayes-1763) left off. A generation later they became the backbone of information theory — Shannon modelled language itself as a Markov chain (see shannon-1948) — and today they live inside everything from Google's original search ranking to the AI that predicts your next word (see transformer-2017).

The original document
Original source text
A. A. Markov · read to the Imperial Academy of Sciences, St. Petersburg, 23 January 1913
The data
Markov took the first 20,000 letters of Pushkin's novel-in-verse Eugene Onegin — the whole of Chapter One and sixteen stanzas of Chapter Two — and classified every letter into just two states: vowel or consonant. He counted 8,638 vowels and 11,362 consonants, so the chance that a letter is a vowel is about 0.43, and a consonant about 0.57.
The dependence
He then asked the new question: does the kind of letter depend on the one before it? Tallying the pairs, he found that a vowel is followed by a vowel only about 0.13 of the time, but a consonant is followed by a vowel about 0.66 of the time. The letters are clearly not independent — vowels and consonants tend to alternate.
Yet, Markov showed, such a dependent sequence still obeys a law of large numbers: the long-run frequency settles down to a definite value (here, the 0.43 vowel rate), and that value is fixed by the transition probabilities — the stationary distribution of the chain. Run the widget below to watch it converge.
[ … ]
The paper develops this with the second-chapter counts as a check and derives the variance of the vowel frequency for dependent samples — extending the classical theory, built for independent trials, to chains of linked events. The full text is at the source below.
St. Petersburg · 1913