Information / CS 1978

Time, Clocks, and the Ordering of Events in a Distributed System

Leslie Lamport

Without a shared clock, order is causality — and a counter can capture it.

Choose your version

In depth · the introduction

When two computers can’t agree what time it is, how do you decide which event happened first? Lamport’s answer: forget the clock — follow the messages.

The big idea

In a single computer, time is easy: one clock, one ticking order. But a distributed system — many computers talking over a network — has no shared clock, and the messages between them take an unpredictable amount of time. So you genuinely cannot always say which of two faraway events came first. Worse, every machine’s clock is slightly off, so you can’t just trust the timestamps.

Lamport’s move was to give up on measuring real time and instead capture cause and effect. One event “happened before” another if it could have influenced it — either it came earlier on the same machine, or it sent a message that the other event received. He then gave each machine a simple counter — a “logical clock” — with two rules: tick it up on every event, and when you receive a message, jump your counter ahead of the sender’s. That tiny trick guarantees that if A really could have caused B, then A’s number is smaller than B’s.

How it came about

In the 1970s, computers were just beginning to be wired together into networks, and engineers kept tripping over the same confusion: events on different machines seemed to happen “at the same time,” or in an order that made no sense. Lamport, then at Massachusetts Computer Associates, was reading a database protocol by Paul Johnson and Robert Thomas that used timestamps to order operations — and he noticed it could behave strangely.

The flash of insight came from physics. Lamport had studied special relativity, where there is also no universal “now” — the order of two distant events can depend on the observer, and only cause-and-effect (what could send a signal to what) is absolute. He realised a distributed computer system is exactly the same situation, and that the relativity way of thinking solved it. He credits the original timestamp idea, and the notion of “anomalous behavior,” to Johnson and Thomas in the paper’s acknowledgments.

Why it mattered

This eight-page paper became one of the most cited in all of computer science. It gave the field its basic vocabulary — “happened before,” “concurrent,” “logical clock” — and a dead-simple algorithm that still runs inside databases, messaging systems, and collaborative apps today. It also drew the line that the rest of distributed computing would explore: causality is real and fixed, but a single global order is something we impose for convenience, not something the universe hands us.

A way to picture it

Imagine a group chat where everyone’s phone clock is wrong, so you can’t trust the timestamps. How do you still know the order? By the replies. If a message quotes or answers an earlier one, it must have come after it — that’s a real “happened before.” But two people typing at the same moment, neither having seen the other’s message, are “concurrent”: there’s no true order between them, and any order you pick is just a tidy convention. Lamport’s counter is exactly the reply-number that keeps this straight.

Where it sits

This sits at the headwaters of distributed computing. Before it, “time in a network” was a muddle; after it, the field had a rigorous footing. Lamport went on to the Byzantine generals problem (how to agree when some machines lie) and the Paxos consensus protocol — work that earned him the 2013 Turing Award. His way of thinking now lives in the databases behind your bank, in Google’s globe-spanning Spanner, and in the shared documents you and a friend edit at once. It belongs beside Turing and Shannon in this Library as a cornerstone of computer science.

The original document

Original source text

Abstract

Leslie Lamport · Commun. ACM 21, 7 (July 1978): 558–565

The concept of one event happening before another in a distributed system is examined, and is shown to define a partial ordering of the events. A distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events. The use of the total ordering is illustrated with a method for solving synchronization problems. The algorithm is then specialized for synchronizing physical clocks, and a bound is derived on how far out of synchrony the clocks can become.

Introduction

The concept of time is fundamental to our way of thinking. It is derived from the more basic concept of the order in which events occur. We say that something happened at 3:15 if it occurred after our clock read 3:15 and before it read 3:16.

In a distributed system, it is sometimes impossible to say that one of two events occurred first. The relation “happened before” is therefore only a partial ordering of the events in the system. We have found that problems often arise because people are not fully aware of this fact and its implications.

The Partial Ordering — “happened before”

The Partial Ordering

Most people would probably say that an event a happened before an event b if a happened at an earlier time than b. … However, if a system is to meet a specification correctly, then that specification must be given in terms of events observable within the system. … We will therefore define the “happened before” relation without using physical clocks.

Definition. The relation “→” on the set of events of a system is the smallest relation satisfying the following three conditions: (1) If a and b are events in the same process, and a comes before b, then a → b. (2) If a is the sending of a message by one process and b is the receipt of the same message by another process, then a → b. (3) If a → b and b → c then a → c. Two distinct events a and b are said to be concurrent if a ↛ b and b ↛ a.

Another way of viewing the definition is to say that a → b means that it is possible for event a to causally affect event b. Two events are concurrent if neither can causally affect the other.

[ … ]

This definition will appear quite natural to the reader familiar with the invariant space-time formulation of special relativity … In relativity, the ordering of events is defined in terms of messages that could be sent. However, we have taken the more pragmatic approach of only considering messages that actually are sent.

Logical Clocks — the Clock Condition

Logical Clocks

We define a clock Ci for each process Pi to be a function which assigns a number Ci(a) to any event a in that process. … For now, we make no assumption about the relation of the numbers Ci(a) to physical time, so we can think of the clocks Ci as logical rather than physical clocks. They may be implemented by counters with no actual timing mechanism.

The strongest reasonable condition is that if an event a occurs before another event b, then a should happen at an earlier time than b. We state this condition more formally as follows.

Clock Condition. For any events a, b: if a → b then C(a) < C(b).

Note that we cannot expect the converse condition to hold as well, since that would imply that any two concurrent events must occur at the same time.

It is easy to see … that the Clock Condition is satisfied if the following two conditions hold. C1. If a and b are events in process Pi, and a comes before b, then Ci(a) < Ci(b). C2. If a is the sending of a message by process Pi and b is the receipt of that message by process Pj, then Ci(a) < Cj(b).

Implementation Rules IR1, IR2

Implementation rules

IR1. Each process Pi increments Ci between any two successive events.

IR2. (a) If event a is the sending of a message m by process Pi, then the message m contains a timestamp Tm = Ci(a). (b) Upon receiving a message m, process Pj sets Cj greater than or equal to its present value and greater than Tm.

Hence, the simple implementation rules IR1 and IR2 imply that the Clock Condition is satisfied, so they guarantee a correct system of logical clocks.

Received March 1976; revised October 1977 · Communications of the ACM, July 1978