人工智能 1982

神经网络与物理系统的涌现集体计算能力

约翰·霍普菲尔德

把神经元对称地连起来，记忆便成了山谷；一块残片，滚下山坡，化为整体。

Choose your version

In depth · the introduction

给这张网络看一个被涂花、被擦去一半的字母，它会把整个字母——干干净净地——还给你。它记忆的方式，正是记忆本该有的样子：靠内容，而非靠地址。

把这个想法拆开看

普通的计算机内存，像一面编了号的储物柜墙：要取出什么，你得知道它的地址。人的记忆全然不是这样——几句旋律、一瞥面孔，整件事就回来了。1982 年，物理学家约翰·霍普菲尔德，造出了一张以第二种方式工作的小小神经元网络。

想象一片有几道深谷的丘陵地景，每道谷对应网络学过的一样东西。递给它一条线索——一个带噪或残缺的图样——你便把一个球放在了某处半山腰。网络的规则很简单：球总是往下滚，并停在最近那道谷的谷底，那便是完整的、记住的图样。在这幅图里，记忆，不过是「滚下山坡，到最近的稳定状态」。

它从哪里来

霍普菲尔德并非科班出身的神经科学家，而是一位物理学家，早已因「光如何在晶体中传播」的研究、以及生物学中的「动力学校对」而知名。他经由自旋玻璃的物理走向大脑——自旋玻璃，是一类无序的磁体，其原子像一根根小指南针，安顿进低能量的排布。他看出：一张带对称连接的神经元网络，正是同一类系统，而它的「安顿」，可以就是在读出一段记忆。

这篇小论文，落在了恰当的时刻。自 1969 年对感知机的批评以来，神经网络已陷入漫长的寒冬，而霍普菲尔德那幅干净的物理图景——记忆即能量山谷——帮着把这个领域在 1980 年代拉回了生机。四十二年后的 2024 年，那次复兴被诺贝尔物理学奖所表彰，由霍普菲尔德与杰弗里·辛顿分享——辛顿的玻尔兹曼机，正是直接从这套模型里长出来的。

它为何重要

霍普菲尔德证明：记忆与计算，不必被编程进去——它们可以自行涌现，从一群都只遵循同一条局部规则的简单部件中。这一个想法——让一个物理系统去寻求它的最低能量，再从它停下的地方读出答案——潜伏在出奇多的现代 AI 之下，从 1980 年代的能量模型，到今天语言模型内部的注意力机制。而且，因为没有哪一个神经元是不可或缺的，记忆在部件失效时会温和地退化，正如我们的记忆一样。

像鸡蛋托里的一颗弹珠

想象把一颗弹珠滚过一只鸡蛋托。无论你在哪里松手，它都会滚进最近的那个凹坑，停下。每个凹坑，是网络存下的一段记忆；你松开弹珠的位置，是你的线索。靠近正确凹坑的线索，会落进它、并复原整段记忆；扔得太野的线索，则可能反而落进错误的坑。在下方试试：挑一个字母，用噪声滑块把它涂花，再让网络把它滚回原形。

之前与之后

网络那些「全或无」的神经元，承自 1943 年的麦卡洛克–皮茨模型；它的学习规则，是唐纳德·赫布 1949 年的「一同激发的细胞，便连在一起」。它与本馆中机器学习的其他奠基者并列——感知机（罗森布拉特 1958），以及下游的 AlexNet（2012）与 Transformer（2017）。这条线索是直接的：霍普菲尔德的能量想法，成了玻尔兹曼机，后者帮助复兴了深度学习；而他那条回忆规则的现代版本，竟恰是 Transformer 赖以运转的那个注意力运算。

The original document

Original source text

J. J. Hopfield · Divisions of Chemistry and Biology, California Institute of Technology, and Bell Laboratories · PNAS 79(8), 2554–2558 · April 1982 (Biophysics; contributed January 15, 1982)

The opening claim

Computational properties of use to biological organisms or to the construction of computers can emerge as collective properties of systems having a large number of simple equivalent components (or neurons).

From this premise Hopfield builds a model of memory not as a set of addressed locations but as the collective behaviour of many identical two-state neurons — its computational power an emergent property of the whole.

[ … ]

The model

He defines N two-state neurons, fully interconnected by symmetric weights Tij = Tji (with Tii = 0) set by a Hebbian outer-product rule over the stored states. The state evolves as each neuron asynchronously compares its weighted input against a threshold and switches on or off accordingly.

The algorithm for the time evolution of the state of the system is based on asynchronous parallel processing.

The paper's pivotal step is to exhibit an energy function E = −½ Σ Tij Vi Vj that the symmetric dynamics can only decrease. The stored patterns are arranged to be its local minima, so the flow toward minimum energy reconstructs a complete memory from a partial cue — content-addressable memory. (The energy function, the convergence argument, the capacity simulations, and the integrated-circuit realization are given in full at the source.)

[ … ]

What emerges

Additional emergent collective properties include some capacity for generalization, familiarity recognition, categorization, error correction, and time sequence retention.

Hopfield reports that the memory is robust: it is only weakly sensitive to the modeling details or to the failure of individual neurons, and his simulations indicate roughly 0.15N patterns can be stored before recall degrades — a number a later spin-glass analysis sharpened to about 0.138N.

California Institute of Technology & Bell Laboratories · 1982