语言学 1956

描述语言的三种模型

诺姆·乔姆斯基

有限的规则，无穷的句子——再用一架文法的阶梯去够到它们。

Choose your version

In depth · the introduction

在十来页纸里，一位 27 岁的青年论证：没有任何一台逐词阅读的机器，能够捕捉一个人类的句子——并就此把语言研究，重新建立在「由有限造出无穷」的规则之上。

这个想法

乔姆斯基说，文法，是一套有限的规则，却能造出无穷多的句子——而检验一套文法的标准，就是看它是否恰好生成一种语言里所有的句子、而且只生成这些。他把三种文法摆在一起，逐一追问：每一种，能做到多少？

最弱的那种，严格从左往右读，从前面刚出现的词里挑出下一个词——就像手机的自动补全。他证明，它绝不可能是英语的文法，因为英语会把从句一层层叠进从句、没有上界，而一个从左往右的读者，会记不清自己还留着多少个没合上。更丰富的文法，用层层嵌套的短语来造句；最丰富的那种，则加进「转换」，把整句整句地彼此关联起来——用单独一条规则，就把一个主动句变成它的被动句。

它如何成形

舞台是 1950 年代中期的麻省理工学院。乔姆斯基在电子学研究实验室工作，周围是一群把语言看作符号流的信息论学者。1956 年，对心智科学是不寻常的一年：刊出这篇论文的同一场九月研讨会上，还有乔治·米勒的「神奇的数字七」，以及纽厄尔与西蒙的一个早期推理程序——许多史学家把认知科学的诞生，定在这一年。乔姆斯基拿起逻辑与新生的计算理论这些工具，把它们对准了语言本身。

它为何重要

这篇论文分出了两条浩大的支流。对语言学家，它奠定了生成语法——把语言看作心智中的一台规则引擎，源源生出无穷多样的句子。对计算机科学家，它奠定了形式语言理论——按「识别一种语言所需的机器」来给语言分类，这正是编译器、搜索与每一个解析器底下的框架。能同时为两整个领域埋下种子的单篇论文，寥寥无几。

盒子套盒子

嵌套，就像俄罗斯套娃，或是相配的括号：( ( ( ) ) )。要核对括号是否配平，你必须把每一个「(」都记着，直到它的「)」到来。一个从左往右的读者，就像只被允许同时记住两个未闭括号的人——开到第三个，他就乱了。现在给他一摞盘子：每见一个「(」就放下一只，每见一个「)」就取回一只。这样，任意多的深度他都应付得来。这摞盘子，正是短语结构文法相对最简机器多出来的那份能力——在下方的小工具里，你能亲手感到这点差别。

之前与之后

这篇论文的背后，立着香农的信息论——也就是乔姆斯基所反对的那种词频模型——以及图灵的计算理论，后者提供了识别每一类语言的那些机器。在它的前方，立着巴科斯—诺尔范式与编程语言的解析；还有——带着乔姆斯基从不接受的反讽——统计与神经语言模型，Transformer 也在其中，它们如今正用「计数」，做到了他坚称计数永远做不到的那件事。

The original document

Original source text

Noam Chomsky · IRE Transactions on Information Theory, vol. IT-2, no. 3, pp. 113–124 · September 1956

Abstract

We investigate several conceptions of linguistic structure to determine whether or not they can provide simple and 'revealing' grammars that generate all of the sentences of English and only these.

The paper then weighs three kinds of grammar against that standard, each more powerful than the last — finite-state, phrase-structure, and transformational.

Model 1 — finite-state grammar

A finite-state (Markov) source generates a sentence by stepping through a finite set of states, emitting a word at each transition. Chomsky shows English is not a finite-state language: in constructions like 'If S1, then S2' and 'Either S1, or S2', and in nested relative clauses, dependencies between distant words nest one inside another to arbitrary depth, and no device with a fixed finite memory can keep count of how many remain open.

…no finite-state Markov process that produces symbols with transition from state to state can serve as an English grammar.

He gives the abstract languages that make the point — the mirror-image strings (aa, bb, abba, baab, aabbaa, …) and a^n b^n — and argues that raising the order of an n-gram statistical approximation to English never converges on the set of grammatical sentences.

Model 2 — phrase-structure grammar

A phrase-structure grammar is a finite vocabulary plus rewrite rules of the form X → Y (for example S → NP VP, VP → Verb NP, NP → Det N). A derivation expands the symbols step by step into a labelled bracketing — the constituent structure of the sentence. This captures how words group into phrases, but Chomsky notes it describes some regularities — the active–passive relation, conjunction, discontinuous elements — only awkwardly, rule by rule.

Model 3 — transformational grammar

The most powerful model keeps a phrase-structure base that generates a kernel of simple sentences, then adds transformations: rules that map a whole structure to another structure. A single passive transformation relates 'the man hit the ball' to 'the ball was hit by the man'. Chomsky argues this third grammar yields the simplest and most revealing description of English.

[ … ]

The full article develops each model formally, with the proofs, the example grammars, and the comparison of their generative capacity, across twelve pages available at the source below.

M.I.T. · 1956