JOVANA
Library Glossary Getting Started Three Levels Fields How it works Mission
Join the mission
Back to the library
資訊 / 電腦科學 1978

時間、時鐘與分散式系統中事件的排序

萊斯利·蘭波特

沒有共享時鐘,順序就是因果——一個計數器便能捕捉它。

Choose your version
In depth · the introduction

當兩台電腦連「現在幾點」都談不攏時,你如何判定哪個事件先發生?蘭波特的答案是:別管時鐘了——跟著訊息走。

核心想法

在一台電腦裡,時間很簡單:一個時鐘,一條滴答的順序。可一個分散式系統——許多台電腦透過網路對話——並沒有一個共享的時鐘,而它們之間的訊息,所花的時間還無法預料。於是,你當真沒法總是說出,兩個遙遠的事件誰先誰後。更糟的是,每台機器的時鐘都略有偏差,所以你也不能就這麼信任時間戳。

蘭波特的高招,是乾脆放棄測量真實時間,轉而去捕捉因與果。一個事件「先發生於」另一個,是指它有可能影響了後者——要麼它在同一台機器上更早發生,要麼它發出了一條被後一個事件收到的訊息。接著,他給每台機器一個簡單的計數器——一個「邏輯時鐘」——配上兩條規則:每發生一個事件就加一;每收到一條訊息,就把自己的計數器跳到傳送方的前面。這個小小的把戲,便保證了:若 A 確有可能導致 B,那麼 A 的數字就一定小於 B 的。

它是如何誕生的

在 1970 年代,電腦才剛剛開始被連成網路,工程師們一再被同一個困惑絆倒:不同機器上的事件,看上去像是「同時」發生,或者排出一個根本說不通的順序。當時在麻薩諸塞電腦聯合公司的蘭波特,正讀著保羅·約翰遜與羅伯特·托馬斯的一個資料庫協定——它用時間戳為操作排序——他注意到,這協定可能會有奇怪的表現。

靈光一閃,來自物理學。蘭波特研究過狹義相對論,那裡同樣沒有一個普適的「此刻」——兩個相距遙遠的事件,誰先誰後可以因觀察者而異,唯有因與果(什麼能向什麼發出訊號)才是絕對的。他意識到,一個分散式電腦系統,正是同樣的處境,而相對論的思路恰好能解開它。在論文的致謝中,他把最初的時間戳想法、以及「異常行為」這個概念,都歸功於約翰遜與托馬斯。

它為何重要

這篇八頁的論文,成了整個電腦科學中被引用最多的論文之一。它給這個領域提供了基本詞彙——「先發生於」「並行」「邏輯時鐘」——以及一個簡單到了極點的演算法,至今仍執行在資料庫、訊息系統與協同應用之中。它還劃下了一條線,供其後的分散式計算去探索:因果是真實而確定的,但一個統一的全域順序,是我們為了方便而強加的,並非宇宙遞到我們手上的。

一個可以想像的畫面

想像一個群聊,每個人手機的時鐘都不準,於是你沒法信任時間戳。那你還能怎麼知道順序?靠回覆。如果一條訊息引用或回答了更早的一條,它就一定在那條之後——這才是真正的「先發生於」。但兩個人在同一刻打字、誰也沒看見對方的訊息,他們就是「並行」的:彼此之間並沒有真正的先後,你挑的任何順序都只是個整齊的約定。蘭波特的計數器,恰恰就是那個把這一切理清的「回覆編號」。

一幅可互動的時空圖:三台機器(P1、P2、P3)畫成豎直的線,時間自上而下流動。滑桿逐步推進事件;訊息以箭頭在各線之間穿行,每台機器的邏輯時鐘計數器在自己的事件上加一,而每當收到一條訊息,就跳到傳送方數值的前面。

它的位置

它處在分散式計算的源頭活水之上。在它之前,「網路中的時間」是一團亂麻;在它之後,這個領域有了嚴謹的根基。蘭波特後來又走向拜占庭將軍問題(當某些機器撒謊時,如何達成一致)與 Paxos 共識協定——這些工作,為他贏得了 2013 年的圖靈獎。他的思路如今活在你銀行背後的資料庫裡、活在 Google 橫跨全球的 Spanner 裡、也活在你和朋友同時編輯的共享文件裡。在本館中,它與圖靈、夏農並立,同為電腦科學的一塊基石。

The original document
Original source text

摘要

Leslie Lamport · Commun. ACM 21, 7 (July 1978): 558–565
The concept of one event happening before another in a distributed system is examined, and is shown to define a partial ordering of the events. A distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events. The use of the total ordering is illustrated with a method for solving synchronization problems. The algorithm is then specialized for synchronizing physical clocks, and a bound is derived on how far out of synchrony the clocks can become.

引言

Introduction
The concept of time is fundamental to our way of thinking. It is derived from the more basic concept of the order in which events occur. We say that something happened at 3:15 if it occurred after our clock read 3:15 and before it read 3:16.
In a distributed system, it is sometimes impossible to say that one of two events occurred first. The relation “happened before” is therefore only a partial ordering of the events in the system. We have found that problems often arise because people are not fully aware of this fact and its implications.

偏序——「先發生於」

The Partial Ordering
Most people would probably say that an event a happened before an event b if a happened at an earlier time than b. … However, if a system is to meet a specification correctly, then that specification must be given in terms of events observable within the system. … We will therefore define the “happened before” relation without using physical clocks.
Definition. The relation “→” on the set of events of a system is the smallest relation satisfying the following three conditions: (1) If a and b are events in the same process, and a comes before b, then a → b. (2) If a is the sending of a message by one process and b is the receipt of the same message by another process, then a → b. (3) If a → b and b → c then a → c. Two distinct events a and b are said to be concurrent if a ↛ b and b ↛ a.
Another way of viewing the definition is to say that a → b means that it is possible for event a to causally affect event b. Two events are concurrent if neither can causally affect the other.
[ … ]
This definition will appear quite natural to the reader familiar with the invariant space-time formulation of special relativity … In relativity, the ordering of events is defined in terms of messages that could be sent. However, we have taken the more pragmatic approach of only considering messages that actually are sent.

邏輯時鐘——時鐘條件

Logical Clocks
We define a clock Ci for each process Pi to be a function which assigns a number Ci(a) to any event a in that process. … For now, we make no assumption about the relation of the numbers Ci(a) to physical time, so we can think of the clocks Ci as logical rather than physical clocks. They may be implemented by counters with no actual timing mechanism.
The strongest reasonable condition is that if an event a occurs before another event b, then a should happen at an earlier time than b. We state this condition more formally as follows.
Clock Condition. For any events a, b: if a → b then C(a) < C(b).
Note that we cannot expect the converse condition to hold as well, since that would imply that any two concurrent events must occur at the same time.
It is easy to see … that the Clock Condition is satisfied if the following two conditions hold. C1. If a and b are events in process Pi, and a comes before b, then Ci(a) < Ci(b). C2. If a is the sending of a message by process Pi and b is the receipt of that message by process Pj, then Ci(a) < Cj(b).

實作規則 IR1、IR2

Implementation rules
IR1. Each process Pi increments Ci between any two successive events.
IR2. (a) If event a is the sending of a message m by process Pi, then the message m contains a timestamp Tm = Ci(a). (b) Upon receiving a message m, process Pj sets Cj greater than or equal to its present value and greater than Tm.
Hence, the simple implementation rules IR1 and IR2 imply that the Clock Condition is satisfied, so they guarantee a correct system of logical clocks.
Received March 1976; revised October 1977 · Communications of the ACM, July 1978