JOVANA
Library Glossary Getting Started Three Levels Fields How it works Mission
Join the mission
Back to the library
信息 / 计算机科学 1978

时间、时钟与分布式系统中事件的排序

莱斯利·兰波特

没有共享时钟,顺序就是因果——一个计数器便能捕捉它。

Choose your version
In depth · the introduction

当两台计算机连「现在几点」都谈不拢时,你如何判定哪个事件先发生?兰波特的答案是:别管时钟了——跟着消息走。

核心想法

在一台计算机里,时间很简单:一个时钟,一条滴答的顺序。可一个分布式系统——许多台计算机通过网络对话——并没有一个共享的时钟,而它们之间的消息,所花的时间还无法预料。于是,你当真没法总是说出,两个遥远的事件谁先谁后。更糟的是,每台机器的时钟都略有偏差,所以你也不能就这么信任时间戳。

兰波特的高招,是干脆放弃测量真实时间,转而去捕捉因与果。一个事件「先发生于」另一个,是指它有可能影响了后者——要么它在同一台机器上更早发生,要么它发出了一条被后一个事件收到的消息。接着,他给每台机器一个简单的计数器——一个「逻辑时钟」——配上两条规则:每发生一个事件就加一;每收到一条消息,就把自己的计数器跳到发送方的前面。这个小小的把戏,便保证了:若 A 确有可能导致 B,那么 A 的数字就一定小于 B 的。

它是如何诞生的

在 1970 年代,计算机才刚刚开始被连成网络,工程师们一再被同一个困惑绊倒:不同机器上的事件,看上去像是「同时」发生,或者排出一个根本说不通的顺序。当时在马萨诸塞计算机联合公司的兰波特,正读着保罗·约翰逊与罗伯特·托马斯的一个数据库协议——它用时间戳为操作排序——他注意到,这协议可能会有奇怪的表现。

灵光一闪,来自物理学。兰波特研究过狭义相对论,那里同样没有一个普适的「此刻」——两个相距遥远的事件,谁先谁后可以因观察者而异,唯有因与果(什么能向什么发出信号)才是绝对的。他意识到,一个分布式计算机系统,正是同样的处境,而相对论的思路恰好能解开它。在论文的致谢中,他把最初的时间戳想法、以及「异常行为」这个概念,都归功于约翰逊与托马斯。

它为何重要

这篇八页的论文,成了整个计算机科学中被引用最多的论文之一。它给这个领域提供了基本词汇——「先发生于」「并发」「逻辑时钟」——以及一个简单到了极点的算法,至今仍运行在数据库、消息系统与协同应用之中。它还划下了一条线,供其后的分布式计算去探索:因果是真实而确定的,但一个统一的全局顺序,是我们为了方便而强加的,并非宇宙递到我们手上的。

一个可以想象的画面

想象一个群聊,每个人手机的时钟都不准,于是你没法信任时间戳。那你还能怎么知道顺序?靠回复。如果一条消息引用或回答了更早的一条,它就一定在那条之后——这才是真正的「先发生于」。但两个人在同一刻打字、谁也没看见对方的消息,他们就是「并发」的:彼此之间并没有真正的先后,你挑的任何顺序都只是个整齐的约定。兰波特的计数器,恰恰就是那个把这一切理清的「回复编号」。

一幅可交互的时空图:三台机器(P1、P2、P3)画成竖直的线,时间自上而下流动。滑块逐步推进事件;消息以箭头在各线之间穿行,每台机器的逻辑时钟计数器在自己的事件上加一,而每当收到一条消息,就跳到发送方数值的前面。

它的位置

它处在分布式计算的源头活水之上。在它之前,「网络中的时间」是一团乱麻;在它之后,这个领域有了严谨的根基。兰波特后来又走向拜占庭将军问题(当某些机器撒谎时,如何达成一致)与 Paxos 共识协议——这些工作,为他赢得了 2013 年的图灵奖。他的思路如今活在你银行背后的数据库里、活在谷歌横跨全球的 Spanner 里、也活在你和朋友同时编辑的共享文档里。在本馆中,它与图灵、香农并立,同为计算机科学的一块基石。

The original document
Original source text

摘要

Leslie Lamport · Commun. ACM 21, 7 (July 1978): 558–565
The concept of one event happening before another in a distributed system is examined, and is shown to define a partial ordering of the events. A distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events. The use of the total ordering is illustrated with a method for solving synchronization problems. The algorithm is then specialized for synchronizing physical clocks, and a bound is derived on how far out of synchrony the clocks can become.

引言

Introduction
The concept of time is fundamental to our way of thinking. It is derived from the more basic concept of the order in which events occur. We say that something happened at 3:15 if it occurred after our clock read 3:15 and before it read 3:16.
In a distributed system, it is sometimes impossible to say that one of two events occurred first. The relation “happened before” is therefore only a partial ordering of the events in the system. We have found that problems often arise because people are not fully aware of this fact and its implications.

偏序——「先发生于」

The Partial Ordering
Most people would probably say that an event a happened before an event b if a happened at an earlier time than b. … However, if a system is to meet a specification correctly, then that specification must be given in terms of events observable within the system. … We will therefore define the “happened before” relation without using physical clocks.
Definition. The relation “→” on the set of events of a system is the smallest relation satisfying the following three conditions: (1) If a and b are events in the same process, and a comes before b, then a → b. (2) If a is the sending of a message by one process and b is the receipt of the same message by another process, then a → b. (3) If a → b and b → c then a → c. Two distinct events a and b are said to be concurrent if a ↛ b and b ↛ a.
Another way of viewing the definition is to say that a → b means that it is possible for event a to causally affect event b. Two events are concurrent if neither can causally affect the other.
[ … ]
This definition will appear quite natural to the reader familiar with the invariant space-time formulation of special relativity … In relativity, the ordering of events is defined in terms of messages that could be sent. However, we have taken the more pragmatic approach of only considering messages that actually are sent.

逻辑时钟——时钟条件

Logical Clocks
We define a clock Ci for each process Pi to be a function which assigns a number Ci(a) to any event a in that process. … For now, we make no assumption about the relation of the numbers Ci(a) to physical time, so we can think of the clocks Ci as logical rather than physical clocks. They may be implemented by counters with no actual timing mechanism.
The strongest reasonable condition is that if an event a occurs before another event b, then a should happen at an earlier time than b. We state this condition more formally as follows.
Clock Condition. For any events a, b: if a → b then C(a) < C(b).
Note that we cannot expect the converse condition to hold as well, since that would imply that any two concurrent events must occur at the same time.
It is easy to see … that the Clock Condition is satisfied if the following two conditions hold. C1. If a and b are events in process Pi, and a comes before b, then Ci(a) < Ci(b). C2. If a is the sending of a message by process Pi and b is the receipt of that message by process Pj, then Ci(a) < Cj(b).

实现规则 IR1、IR2

Implementation rules
IR1. Each process Pi increments Ci between any two successive events.
IR2. (a) If event a is the sending of a message m by process Pi, then the message m contains a timestamp Tm = Ci(a). (b) Upon receiving a message m, process Pj sets Cj greater than or equal to its present value and greater than Tm.
Hence, the simple implementation rules IR1 and IR2 imply that the Clock Condition is satisfied, so they guarantee a correct system of logical clocks.
Received March 1976; revised October 1977 · Communications of the ACM, July 1978