Molecular Biology 1977

DNA Sequencing with Chain-Terminating Inhibitors

Frederick Sanger, Steve Nicklen & Alan Coulson

Let DNA copy itself, but spike each base with a “stop” — the lengths spell out the sequence.

Choose your version

In depth · the introduction

To read the order of letters in a strand of DNA, Sanger's trick was to let it copy itself — but secretly slip in “stop” letters, so the copies pile up at every position and their lengths spell out the sequence.

The big idea

DNA is a string of four letters — A, C, G, T. For decades we could see that the letters were there but not read their order. Frederick Sanger found a way. You take the strand you want to read and let a copying enzyme build its complement, one letter at a time. The clever part is what you add to the mix: a few sabotaged letters — “dideoxy” versions — that the enzyme will happily attach, but which then refuse to let any further letter join. Each one is a full stop.

Do this four times, once for each kind of letter, each time sabotaging only the A's, or only the C's, and so on. In the “stop-at-A” batch you get copies that end at every A in the sequence; in the “stop-at-C” batch, copies ending at every C. Now you have, across four piles, a copy that stops at every single position — and the length of each copy tells you exactly where its stopping letter sits.

How it came about

By the mid-1970s the chemistry of DNA was understood, but reading a long sequence was painfully slow. Sanger, working quietly at the Medical Research Council's Laboratory of Molecular Biology in Cambridge, had already won a Nobel Prize for working out the sequence of insulin, a protein. He turned the same patience on DNA. With Alan Coulson he first built a clumsier “plus and minus” method; then, in 1977, came the elegant one — the dideoxy, or chain-termination, method. That same year an American pair, Allan Maxam and Walter Gilbert, published a completely different chemical method. For a while both were used, but Sanger's proved easier and was the one machines could eventually be taught to run. In 1980 it brought him a second Nobel Prize.

Why it mattered

Before this, the genome was a closed book. Sanger's method opened it. Once you can read the order of the letters, you can find the gene behind a disease, compare one species' DNA with another's, and check whether an edit did what you intended. Sped up and handed to machines, the very same idea read the entire human genome — three billion letters — and laid the foundation of modern genetics and medicine.

A way to picture it

Imagine photocopying a long sentence, but your copier is rigged so that, now and then, it jams right after copying a particular letter — say, every time it hits an “e”. Run a stack of copies and you'll get pages that stop at the first “e”, the second, the third, and so on. Line them up shortest to longest and the place each one stops tells you exactly where every “e” falls in the sentence. Do it for each letter of the alphabet and you can reconstruct the whole sentence from nothing but the lengths of the jammed copies. The “stop” letters are dideoxy bases; the lengths are read off a gel.

Where it sits

Watson and Crick (1953) showed DNA was a four-letter code paired in a double helix; that pairing is exactly what Sanger's copying enzyme exploits. Mendel's abstract “factors” had by now become readable stretches of letters. Sanger sequencing then joined forces with PCR (1985), which makes enough copies of a gene to read, and led straight to the Human Genome Project and to today's gene-editing medicine — where a CRISPR edit is checked by sequencing the very letters it changed.

The original document

Original source text

F. Sanger, S. Nicklen & A. R. Coulson · Proc. Natl. Acad. Sci. USA 74 (1977): 5463–5467

Sanger's group describes a way to read the order of bases along a DNA strand by copying it with a polymerase and making the copy stop, deliberately, at chosen letters. The abstract states the idea in full; the body of the paper is mapped in structure below, with its complete text at the source.

From the abstract

A new method for determining nucleotide sequences in DNA is described.

It is similar to the "plus and minus" method but makes use of the 2′,3′-dideoxy and arabinonucleoside analogues of the normal deoxynucleoside triphosphates, which act as specific chain-terminating inhibitors of DNA polymerase.

The technique has been applied to the DNA of bacteriophage ϕX174 and is more rapid and more accurate than either the plus or the minus method.

How the dideoxy method works (structural map)

A short primer is annealed to a single-stranded template and extended by DNA polymerase. The reaction is split four ways; each tube holds all four normal deoxynucleotides plus a trace of one 2′,3′-dideoxynucleotide (ddNTP). The dideoxy analogue is incorporated normally but has no 3′-hydroxyl, so the chain cannot be extended past it: synthesis stops specifically at that base.

Because only a fraction of chains terminate at each matching position, each tube yields a nested set of fragments — ending at every A, or every C, every G, every T — all sharing the primer's 5′ end and labelled radioactively. Run side by side on a denaturing polyacrylamide gel, separated to single-base resolution, the fragments form a ladder; read from the bottom (shortest) up, it gives the synthesized strand 5′→3′.

[ … ]

The dideoxy method built on the earlier "plus and minus" method of Sanger and Coulson (1975) and on the M13 single-stranded cloning system. A different chemical-cleavage method by Maxam and Gilbert appeared the same year; the dideoxy approach prevailed because it proved simpler and, later, automatable.

MRC Laboratory of Molecular Biology, Cambridge · 1977