A Relational Model of Data for Large Shared Data Banks
Store data as plain tables of relations — and free every program from how the bytes are arranged.
Before 1970, asking a computer for data meant knowing exactly where it sat. Codd's idea was radical in its simplicity: store everything as plain tables, and just describe what you want.
The big idea
A database is a place to keep huge amounts of organised information — customers, orders, flights, accounts. In the 1960s, getting an answer out of one meant following a rigid trail of pointers the designer had laid down in advance; programs had to know the physical path to every piece of data, so any change to how the data was stored could break them.
Edgar Codd, a mathematician at IBM, proposed something far simpler. Keep all the data as ordinary tables — he called them relations — where each row is one record and each column is one kind of fact. Then let people ask for data by describing it ("all suppliers in Paris") instead of telling the machine how to go and fetch it. The computer figures out the how. That separation — between what you want and where it lives — is called data independence, and it changed everything.
How it came about
Codd published the idea in 1970 in a research journal, and at first it met resistance — even inside IBM, which was heavily invested in an older system. Many engineers simply did not believe a database built on mathematical tables could ever be fast enough to use. The disagreement came to a head in a famous 1974 public debate between Codd and Charles Bachman, the champion of the older "network" approach.
What settled it was not debate but working software. Two teams turned the theory into real systems: IBM's own System R project in San Jose, which produced the query language SQL, and the Ingres project at Berkeley. They proved a relational database could be both elegant and fast. By the 1980s the relational model had won; Codd received computing's highest honour, the Turing Award, in 1981.
Why it mattered
Almost every database you have ever indirectly touched — your bank, your airline booking, an online shop, a hospital's records — is a relational one, descended from this paper. The language for talking to them, SQL, became one of the most widely used in the world. By letting people ask for information by its meaning rather than its location, Codd made data something you could reason about, combine, and trust without being a storage expert.
A way to picture it
Think of an old library with no catalogue: to find a book you'd have to know its exact shelf, and if the librarian rearranged the shelves, your directions would be useless. Codd's model is the catalogue. You describe the book you want — author, subject — and the system finds it, no matter where it has been shelved. Rearranging the shelves for efficiency no longer breaks anything, because you never relied on the location in the first place.
Where it sits
Codd took a tool from pure mathematics — the idea of a relation as a set of tuples, the same set theory behind much of logic — and pointed it at a grubby practical problem: how to store a company's records. The Library holds the threads on either side: Shannon and Turing built the theory of information and computation this rests on, and the descendants of Codd's tables now hold the data that today's AI, from the Transformer onward, learns from.
Abstract & Introduction
Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation).
A model based on n-ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced.
… the problems treated here are those of data independence — the independence of application programs and terminal activities from growth in data types and changes in data representation …
1.2 Data dependence today
1.3 A relational view of data
The term relation is used here in its accepted mathematical sense.
R is said to have degree n. Relations of degree 1 are often called unary, degree 2 binary, degree 3 ternary, and degree n n-ary.
The totality of data in a data bank may be viewed as a collection of time-varying relations.
1.4 Normal form
There is, in fact, a very simple elimination procedure, which we shall call normalization.
2. Operations & redundancy
Projection. Suppose now we select certain columns of a relation (striking out the others) and then remove from the resulting array any duplication in the rows. The final array represents a relation …