語言學 1956

描述語言的三種模型

諾姆·杭士基

有限的規則，無窮的句子——再用一架文法的階梯去搆到它們。

Choose your version

In depth · the introduction

在十來頁紙裡，一位 27 歲的青年論證：沒有任何一台逐詞閱讀的機器，能夠捕捉一個人類的句子——並就此把語言研究，重新建立在「由有限造出無窮」的規則之上。

這個想法

杭士基說，文法，是一套有限的規則，卻能造出無窮多的句子——而檢驗一套文法的標準，就是看它是否恰好生成一種語言裡所有的句子、而且只生成這些。他把三種文法擺在一起，逐一追問：每一種，能做到多少？

最弱的那種，嚴格從左往右讀，從前面剛出現的詞裡挑出下一個詞——就像手機的自動完成。他證明，它絕不可能是英語的文法，因為英語會把子句一層層疊進子句、沒有上界，而一個從左往右的讀者，會記不清自己還留著多少個沒合上。更豐富的文法，用層層嵌套的短語來造句；最豐富的那種，則加進「轉換」，把整句整句地彼此關聯起來——用單獨一條規則，就把一個主動句變成它的被動句。

它如何成形

舞台是 1950 年代中期的麻省理工學院。杭士基在電子學研究實驗室工作，周圍是一群把語言看作符號流的資訊理論學者。1956 年，對心智科學是不尋常的一年：刊出這篇論文的同一場九月研討會上，還有喬治·米勒的「神奇的數字七」，以及紐厄爾與西蒙的一個早期推理程式——許多史學家把認知科學的誕生，定在這一年。杭士基拿起邏輯與新生的計算理論這些工具，把它們對準了語言本身。

它為何重要

這篇論文分出了兩條浩大的支流。對語言學家，它奠定了生成語法——把語言看作心智中的一台規則引擎，源源生出無窮多樣的句子。對計算機科學家，它奠定了形式語言理論——按「辨識一種語言所需的機器」來給語言分類，這正是編譯器、搜尋與每一個解析器底下的框架。能同時為兩整個領域埋下種子的單篇論文，寥寥無幾。

盒子套盒子

嵌套，就像俄羅斯套娃，或是相配的括號：( ( ( ) ) )。要核對括號是否配平，你必須把每一個「(」都記著，直到它的「)」到來。一個從左往右的讀者，就像只被允許同時記住兩個未閉括號的人——開到第三個，他就亂了。現在給他一摞盤子：每見一個「(」就放下一隻，每見一個「)」就取回一隻。這樣，任意多的深度他都應付得來。這摞盤子，正是短語結構文法相對最簡機器多出來的那份能力——在下方的小工具裡，你能親手感到這點差別。

之前與之後

這篇論文的背後，立著夏農的資訊理論——也就是杭士基所反對的那種詞頻模型——以及圖靈的計算理論，後者提供了辨識每一類語言的那些機器。在它的前方，立著巴科斯—諾爾範式與程式語言的解析；還有——帶著杭士基從不接受的反諷——統計與神經語言模型，Transformer 也在其中，它們如今正用「計數」，做到了他堅稱計數永遠做不到的那件事。

The original document

Original source text

Noam Chomsky · IRE Transactions on Information Theory, vol. IT-2, no. 3, pp. 113–124 · September 1956

Abstract

We investigate several conceptions of linguistic structure to determine whether or not they can provide simple and 'revealing' grammars that generate all of the sentences of English and only these.

The paper then weighs three kinds of grammar against that standard, each more powerful than the last — finite-state, phrase-structure, and transformational.

Model 1 — finite-state grammar

A finite-state (Markov) source generates a sentence by stepping through a finite set of states, emitting a word at each transition. Chomsky shows English is not a finite-state language: in constructions like 'If S1, then S2' and 'Either S1, or S2', and in nested relative clauses, dependencies between distant words nest one inside another to arbitrary depth, and no device with a fixed finite memory can keep count of how many remain open.

…no finite-state Markov process that produces symbols with transition from state to state can serve as an English grammar.

He gives the abstract languages that make the point — the mirror-image strings (aa, bb, abba, baab, aabbaa, …) and a^n b^n — and argues that raising the order of an n-gram statistical approximation to English never converges on the set of grammatical sentences.

Model 2 — phrase-structure grammar

A phrase-structure grammar is a finite vocabulary plus rewrite rules of the form X → Y (for example S → NP VP, VP → Verb NP, NP → Det N). A derivation expands the symbols step by step into a labelled bracketing — the constituent structure of the sentence. This captures how words group into phrases, but Chomsky notes it describes some regularities — the active–passive relation, conjunction, discontinuous elements — only awkwardly, rule by rule.

Model 3 — transformational grammar

The most powerful model keeps a phrase-structure base that generates a kernel of simple sentences, then adds transformations: rules that map a whole structure to another structure. A single passive transformation relates 'the man hit the ball' to 'the ball was hit by the man'. Chomsky argues this third grammar yields the simplest and most revealing description of English.

[ … ]

The full article develops each model formally, with the proofs, the example grammars, and the comparison of their generative capacity, across twelve pages available at the source below.

M.I.T. · 1956