人工智慧 1982

神經網路與物理系統的湧現集體計算能力

約翰·霍普菲爾德

把神經元對稱地連起來，記憶便成了山谷；一塊殘片，滾下山坡，化為整體。

Choose your version

In depth · the introduction

給這張網路看一個被塗花、被擦去一半的字母，它會把整個字母——乾乾淨淨地——還給你。它記憶的方式，正是記憶本該有的樣子：靠內容，而非靠位址。

把這個想法拆開看

普通的電腦記憶體，像一面編了號的置物櫃牆：要取出什麼，你得知道它的位址。人的記憶全然不是這樣——幾句旋律、一瞥面孔，整件事就回來了。1982 年，物理學家約翰·霍普菲爾德，造出了一張以第二種方式工作的小小神經元網路。

想像一片有幾道深谷的丘陵地景，每道谷對應網路學過的一樣東西。遞給它一條線索——一個帶雜訊或殘缺的圖樣——你便把一個球放在了某處半山腰。網路的規則很簡單：球總是往下滾，並停在最近那道谷的谷底，那便是完整的、記住的圖樣。在這幅圖裡，記憶，不過是「滾下山坡，到最近的穩定狀態」。

它從哪裡來

霍普菲爾德並非科班出身的神經科學家，而是一位物理學家，早已因「光如何在晶體中傳播」的研究、以及生物學中的「動力學校對」而知名。他經由自旋玻璃的物理走向大腦——自旋玻璃，是一類無序的磁體，其原子像一根根小指南針，安頓進低能量的排布。他看出：一張帶對稱連接的神經元網路，正是同一類系統，而它的「安頓」，可以就是在讀出一段記憶。

這篇小論文，落在了恰當的時刻。自 1969 年對感知器的批評以來，神經網路已陷入漫長的寒冬，而霍普菲爾德那幅乾淨的物理圖景——記憶即能量山谷——幫著把這個領域在 1980 年代拉回了生機。四十二年後的 2024 年，那次復興被諾貝爾物理學獎所表彰，由霍普菲爾德與傑弗里·辛頓分享——辛頓的玻爾茲曼機，正是直接從這套模型裡長出來的。

它為何重要

霍普菲爾德證明：記憶與計算，不必被編程進去——它們可以自行湧現，從一群都只遵循同一條局部規則的簡單部件中。這一個想法——讓一個物理系統去尋求它的最低能量，再從它停下的地方讀出答案——潛伏在出奇多的現代 AI 之下，從 1980 年代的能量模型，到今天語言模型內部的注意力機制。而且，因為沒有哪一個神經元是不可或缺的，記憶在部件失效時會溫和地退化，正如我們的記憶一樣。

像雞蛋盒裡的一顆彈珠

想像把一顆彈珠滾過一只雞蛋盒。無論你在哪裡鬆手，它都會滾進最近的那個凹坑，停下。每個凹坑，是網路存下的一段記憶；你鬆開彈珠的位置，是你的線索。靠近正確凹坑的線索，會落進它、並復原整段記憶；扔得太野的線索，則可能反而落進錯誤的坑。在下方試試：挑一個字母，用雜訊滑桿把它塗花，再讓網路把它滾回原形。

之前與之後

網路那些「全或無」的神經元，承自 1943 年的麥卡洛克–皮茨模型；它的學習規則，是唐納德·赫布 1949 年的「一同激發的細胞，便連在一起」。它與本館中機器學習的其他奠基者並列——感知器（羅森布拉特 1958），以及下游的 AlexNet（2012）與 Transformer（2017）。這條線索是直接的：霍普菲爾德的能量想法，成了玻爾茲曼機，後者幫助復興了深度學習；而他那條回憶規則的現代版本，竟恰是 Transformer 賴以運轉的那個注意力運算。

The original document

Original source text

J. J. Hopfield · Divisions of Chemistry and Biology, California Institute of Technology, and Bell Laboratories · PNAS 79(8), 2554–2558 · April 1982 (Biophysics; contributed January 15, 1982)

The opening claim

Computational properties of use to biological organisms or to the construction of computers can emerge as collective properties of systems having a large number of simple equivalent components (or neurons).

From this premise Hopfield builds a model of memory not as a set of addressed locations but as the collective behaviour of many identical two-state neurons — its computational power an emergent property of the whole.

[ … ]

The model

He defines N two-state neurons, fully interconnected by symmetric weights Tij = Tji (with Tii = 0) set by a Hebbian outer-product rule over the stored states. The state evolves as each neuron asynchronously compares its weighted input against a threshold and switches on or off accordingly.

The algorithm for the time evolution of the state of the system is based on asynchronous parallel processing.

The paper's pivotal step is to exhibit an energy function E = −½ Σ Tij Vi Vj that the symmetric dynamics can only decrease. The stored patterns are arranged to be its local minima, so the flow toward minimum energy reconstructs a complete memory from a partial cue — content-addressable memory. (The energy function, the convergence argument, the capacity simulations, and the integrated-circuit realization are given in full at the source.)

[ … ]

What emerges

Additional emergent collective properties include some capacity for generalization, familiarity recognition, categorization, error correction, and time sequence retention.

Hopfield reports that the memory is robust: it is only weakly sensitive to the modeling details or to the failure of individual neurons, and his simulations indicate roughly 0.15N patterns can be stored before recall degrades — a number a later spin-glass analysis sharpened to about 0.138N.

California Institute of Technology & Bell Laboratories · 1982