Fidelity & Proofreading

An accuracy that should not be possible

By now you have watched the whole worksite in action: the fork opens, primase lays primers, the replicative polymerase races along, and the leading and lagging strands come together. This guide asks the question that should have been nagging you all along — *how good is the copy?* Replication is famously semiconservative, so every new molecule keeps one old strand as a template; but copying a template is only as useful as it is accurate. So how accurate is it?

The headline number is staggering. A human cell copies about 6 billion base pairs every time it divides, and across all that, it leaves behind roughly one mistake per billion or so bases copied — a fidelity near 10^-9. Put that on a human scale: it is like transcribing every book in a large library, letter by letter, and making about one typo in the whole collection. No human typist, and no photocopier, comes anywhere close. The puzzle is that the chemistry the cell starts with is nowhere near that reliable. Something must be lifting raw, sloppy chemistry to near-perfection.

Layer one: pairing is picky, but not picky enough

The first line of defence is the base pairing you already know. Watson–Crick pairing is selective because the right partners fit and the wrong ones don't: A reaches across to T with two hydrogen bonds, G to C with three (A-T / G-C), and a correct pair also has exactly the right *shape* and width to slot into the helix. The polymerase does not just trust the hydrogen bonds; its active site is a snug pocket that physically grips an incoming nucleotide and tests whether it makes a geometrically perfect pair with the template base. A correctly shaped pair lets the enzyme close around it and add the base; a misshapen one fits poorly and is usually rejected before the bond is ever formed.

But here is the honest catch: pairing alone is *not* picky enough. The difference in stability between a right pair and a wrong one is only a few units of free energy (a modest delta G), and bases occasionally flicker into rare chemical forms — called tautomers — that briefly mimic the shape of a different base, fooling the pocket. As a result, selection-by-shape lets a wrong nucleotide slip in about once every 10,000 to 100,000 additions. That would mean tens of thousands of errors per genome copy — disastrous on its own. The cell clearly needs a second check that runs *after* a base has actually been added.

Layer two: a polymerase that backs up to erase its own mistakes

This is the heart of the story — and the source of the term [[replication-fidelity-proofreading|proofreading]]. The replicative polymerase is not one machine but two activities welded into one protein. The familiar half builds DNA, adding bases 5'-to-3'. The second half is a separate enzyme tucked into the same protein: a 3'-to-5' exonuclease, a tiny pair of molecular scissors that chews nucleotides off the *3' end* of the new strand — exactly the end where the polymerase just added the most recent base. Think of a pen with an eraser built into its other end: write forward, and the instant you sense a slip, flip and rub out the last character.

How does the enzyme *know* it just made a mistake, if it can't read the genetic meaning? It feels it. A correctly paired 3' end sits snugly base-paired and slides smoothly into the polymerizing site, ready for the next base. A mismatched 3' end pairs badly, frays, and wobbles — the helix at the tip is distorted and unstable. That looseness slows the next addition and, crucially, makes the frayed end more likely to flip across into the nearby exonuclease pocket. There the scissors snip off the wrong base, the corrected 3' end swings back into the polymerizing site, and synthesis resumes. The enzyme never "understands" the error; it just reacts to the *feel* of a bad fit.

Add a base. The polymerizing site pairs an incoming nucleotide with the template and forges the backbone bond, extending the new strand by one at its 3' end.
Sense the fit. If the newest pair is correct it sits tight and the enzyme smoothly moves on. If it is wrong, the 3' tip frays and wobbles, and the next addition stalls.
Hand off to the scissors. The frayed 3' end flips across into the 3'-to-5' exonuclease site, which clips off the most recently added (wrong) nucleotide.
Resume. The trimmed, correctly paired 3' end swings back into the polymerizing site, and the enzyme tries the addition again — usually getting it right this time.

This is also exactly why the polymerase can only build 5'-to-3', a rule you met in the previous guide. Proofreading lives at the 3' end and trims from there; a strand that grew the *other* way would carry its high-energy growing tip at the 5' end, and removing a base there would strip away the very phosphates that power the next addition. Synthesis direction and proofreading are two faces of one design. Proofreading is powerful but not free — it costs time and discards good building blocks too — so polymerases that copy short, less critical stretches (and many RNA polymerases) skip it. The replicative machine spends the effort precisely because the genome is worth it.

Layer three: the final check — mismatch repair

Even with proofreading, a few wrong bases slip past — roughly one in ten million. A third crew sweeps the road *behind* the fork: [[mismatch-repair|mismatch repair]], which you will meet in full in the DNA-repair rung. It scans freshly made DNA for the tell-tale bulge of a mispaired base — a place where the two strands don't sit flush — cuts out a patch of the *new* strand around the error, and lets the polymerase resynthesize it correctly from the template.

There is a deep problem hiding in that sentence, and it is worth pausing on. When the repair crew finds a mismatch — say an A across from a C — *which* base is the error? The template still holds the original truth; the new strand holds the typo. Repairing the wrong one would lock the mistake in forever. So mismatch repair must tell the brand-new strand from the old template strand, and it does this with a transient *mark*. In bacteria like E. coli, the old strand is chemically tagged (its A's in certain spots carry a methyl group) while the new strand is briefly unmethylated, so the system trusts the marked strand and rewrites the unmarked one. Eukaryotes use other cues — such as the nicks still present in the not-yet-sealed new strand — but the principle is the same: *correct toward the strand you can trust.*

selectivity  ~10^-5   pairing fits / wrong base rejected
   x  proofreading ~10^-2   3'->5' exonuclease trims the bad 3' base
   x  mismatch     ~10^-2   repairs new strand using the old as truth
  ----------------------------------------------------------------
   = overall      ~10^-9   about one error per billion bases

Three modest filters multiply into near-perfect fidelity.

Why not perfect? Errors are the fuel of evolution

Here is the twist that surprises most newcomers. The cell has the machinery to be even more accurate than it is — yet it does not push fidelity all the way to zero error, and that is not a failure. Each surviving copying error is a [[molbio-point-mutation|point mutation]]: a permanent change to the sequence, the raw material of a mutation. With zero mutation, every offspring would be a perfect clone forever, and a species facing a changing world — a new pathogen, a colder climate — would have no variation for selection to act on. A lineage that copied with literally perfect fidelity would be evolutionarily frozen, and far more likely to go extinct. A little sloppiness is the price of a future.

And here is the reassuring honest fact that defuses the fear of mutation: most mutations do nothing much. Across the spectrum of mutational effects, the great majority are roughly neutral — silent changes, or tweaks in regions that don't matter. A minority are harmful, and a precious few are beneficial. Mutation is not a synonym for disease; it is the slow drip of variation that, filtered by selection over generations, produced every living thing including you. The cell tunes its mutation rate the way you'd tune any error budget: low enough that harmful errors stay rare, but not zero, because variation has value.

Two final caveats to carry up the ladder. First, fidelity is not uniform: some organisms run "hotter" on purpose — RNA viruses, lacking proofreading, mutate millions of times faster, which is exactly why a flu shot is reformulated each year and why some viruses outrun our immunity. Second, the rate even in us is not fixed; under stress, certain error-prone polymerases are deliberately switched on to copy past damage, trading accuracy for survival. "One error in a billion" is a beautiful average, not an iron constant — and that flexibility, too, is part of the design.