Three Models for the Description of Language
A finite set of rules, an infinite set of sentences — and a ladder of grammars to reach them.
In a dozen pages, a 27-year-old argued that no machine reading word by word could ever capture a human sentence — and rebuilt the study of language around rules that make the infinite out of the finite.
The idea
A grammar, Chomsky said, is a finite set of rules able to build an unlimited number of sentences — and the test of a grammar is whether it produces all the sentences of a language and only those. He lined up three kinds of grammar and asked how much each could do.
The weakest reads strictly left to right, choosing each next word from the words just before it — the way a phone's autocomplete works. He proved it can never be a grammar of English, because English folds clauses inside clauses without limit, and a left-to-right reader loses track of how many it has left open. A richer grammar builds sentences out of nested phrases; the richest adds 'transformations' that relate whole sentences to one another — turning an active sentence into its passive by a single rule.
How it came about
The setting was MIT in the mid-1950s, where Chomsky worked in the Research Laboratory of Electronics among information theorists who modelled language as streams of symbols. 1956 was a remarkable year for the mind sciences: the same September symposium that carried this paper also carried George Miller's 'magical number seven' and an early reasoning program by Newell and Simon — many historians date the birth of cognitive science to it. Chomsky took the tools of logic and the new theory of computation and turned them on language itself.
Why it mattered
The paper split into two enormous streams. For linguists it founded generative grammar — the view of language as a rule engine in the mind that produces an endless variety of sentences. For computer scientists it founded formal language theory — the sorting of languages by the kind of machine needed to recognize them, the framework underneath compilers, search, and every parser. Few single papers seed two whole fields.
Boxes inside boxes
Nesting is like Russian dolls, or matching brackets: ( ( ( ) ) ). To check that the brackets balance you must remember every '(' until its ')' arrives. A left-to-right reader is like someone allowed to remember only two open brackets at a time — open a third and they are lost. Now give them a stack of plates: set one down for every '(', take one back for every ')'. They can handle any depth at all. That stack is exactly the extra power a phrase-structure grammar has over the simplest machine — and you can feel the difference in the widget below.
Before and after
Behind this paper stand Shannon's information theory — the word-statistics models Chomsky was arguing against — and Turing's theory of computation, which supplies the machines that recognize each class of language. Ahead of it stand Backus–Naur Form and the parsing of programming languages, and, with an irony Chomsky never accepted, the statistical and neural language models — the Transformer among them — that now do, by counting, the very thing he insisted counting could never do.
We investigate several conceptions of linguistic structure to determine whether or not they can provide simple and 'revealing' grammars that generate all of the sentences of English and only these.
…no finite-state Markov process that produces symbols with transition from state to state can serve as an English grammar.