Linguistics 1956

Three Models for the Description of Language

Noam Chomsky

A finite set of rules, an infinite set of sentences — and a ladder of grammars to reach them.

Choose your version

In depth · the introduction

In a dozen pages, a 27-year-old argued that no machine reading word by word could ever capture a human sentence — and rebuilt the study of language around rules that make the infinite out of the finite.

The idea

A grammar, Chomsky said, is a finite set of rules able to build an unlimited number of sentences — and the test of a grammar is whether it produces all the sentences of a language and only those. He lined up three kinds of grammar and asked how much each could do.

The weakest reads strictly left to right, choosing each next word from the words just before it — the way a phone's autocomplete works. He proved it can never be a grammar of English, because English folds clauses inside clauses without limit, and a left-to-right reader loses track of how many it has left open. A richer grammar builds sentences out of nested phrases; the richest adds 'transformations' that relate whole sentences to one another — turning an active sentence into its passive by a single rule.

How it came about

The setting was MIT in the mid-1950s, where Chomsky worked in the Research Laboratory of Electronics among information theorists who modelled language as streams of symbols. 1956 was a remarkable year for the mind sciences: the same September symposium that carried this paper also carried George Miller's 'magical number seven' and an early reasoning program by Newell and Simon — many historians date the birth of cognitive science to it. Chomsky took the tools of logic and the new theory of computation and turned them on language itself.

Why it mattered

The paper split into two enormous streams. For linguists it founded generative grammar — the view of language as a rule engine in the mind that produces an endless variety of sentences. For computer scientists it founded formal language theory — the sorting of languages by the kind of machine needed to recognize them, the framework underneath compilers, search, and every parser. Few single papers seed two whole fields.

Boxes inside boxes

Nesting is like Russian dolls, or matching brackets: ( ( ( ) ) ). To check that the brackets balance you must remember every '(' until its ')' arrives. A left-to-right reader is like someone allowed to remember only two open brackets at a time — open a third and they are lost. Now give them a stack of plates: set one down for every '(', take one back for every ')'. They can handle any depth at all. That stack is exactly the extra power a phrase-structure grammar has over the simplest machine — and you can feel the difference in the widget below.

Before and after

Behind this paper stand Shannon's information theory — the word-statistics models Chomsky was arguing against — and Turing's theory of computation, which supplies the machines that recognize each class of language. Ahead of it stand Backus–Naur Form and the parsing of programming languages, and, with an irony Chomsky never accepted, the statistical and neural language models — the Transformer among them — that now do, by counting, the very thing he insisted counting could never do.

The original document

Original source text

Noam Chomsky · IRE Transactions on Information Theory, vol. IT-2, no. 3, pp. 113–124 · September 1956

Abstract

We investigate several conceptions of linguistic structure to determine whether or not they can provide simple and 'revealing' grammars that generate all of the sentences of English and only these.

The paper then weighs three kinds of grammar against that standard, each more powerful than the last — finite-state, phrase-structure, and transformational.

Model 1 — finite-state grammar

A finite-state (Markov) source generates a sentence by stepping through a finite set of states, emitting a word at each transition. Chomsky shows English is not a finite-state language: in constructions like 'If S1, then S2' and 'Either S1, or S2', and in nested relative clauses, dependencies between distant words nest one inside another to arbitrary depth, and no device with a fixed finite memory can keep count of how many remain open.

…no finite-state Markov process that produces symbols with transition from state to state can serve as an English grammar.

He gives the abstract languages that make the point — the mirror-image strings (aa, bb, abba, baab, aabbaa, …) and a^n b^n — and argues that raising the order of an n-gram statistical approximation to English never converges on the set of grammatical sentences.

Model 2 — phrase-structure grammar

A phrase-structure grammar is a finite vocabulary plus rewrite rules of the form X → Y (for example S → NP VP, VP → Verb NP, NP → Det N). A derivation expands the symbols step by step into a labelled bracketing — the constituent structure of the sentence. This captures how words group into phrases, but Chomsky notes it describes some regularities — the active–passive relation, conjunction, discontinuous elements — only awkwardly, rule by rule.

Model 3 — transformational grammar

The most powerful model keeps a phrase-structure base that generates a kernel of simple sentences, then adds transformations: rules that map a whole structure to another structure. A single passive transformation relates 'the man hit the ball' to 'the ball was hit by the man'. Chomsky argues this third grammar yields the simplest and most revealing description of English.

[ … ]

The full article develops each model formally, with the proofs, the example grammars, and the comparison of their generative capacity, across twelve pages available at the source below.

M.I.T. · 1956