The Replication Fork & Its Machines

Where copying happens

In the previous guide you saw *that* DNA is copied semiconservatively — each old strand becomes the template for a new partner — and how Meselson and Stahl proved it with nothing but a centrifuge and two flavours of nitrogen. That told you the bookkeeping: one old strand plus one new strand per daughter helix. What it did not tell you is *where* in the long molecule the copying actually starts, or *how* the cell physically pulls a tightly wound double helix apart and writes a faithful copy of each side. That is this guide.

Copying does not begin just anywhere. It begins at marked spots in the DNA called [[molbio-origin-of-replication|origins of replication]] — think of them as the official 'start copying here' signs written into the sequence. An origin is a particular stretch of DNA, often rich in A-T pairs. Why A-T? Recall from the chemistry rungs that an A-T pair is held by only two hydrogen bonds while a G-C pair has three, so A-T-rich DNA is the easiest place to pry the strands apart. Initiator proteins recognise the origin, force a small region open, and call in the rest of the machinery.

Once a small region is opened, copying does not crawl off in one direction — it spreads *both* ways at once, like unzipping a jacket from a point in the middle. The opened gap is the replication bubble, and at each of its two ends is a moving Y-shaped junction where 'still zipped' meets 'now open.' Each junction is a [[molbio-replication-fork|replication fork]], and the fork is the actual worksite: the place where the parent helix splits into two single strands and the copying gets done. Under a microscope a small bubble looks like an open eye in the DNA, which is why it is sometimes called a replication eye.

Meet the replisome

Copying DNA is not the work of one lonely enzyme. At each fork, all the proteins needed to do the job gather into one large, coordinated assembly that travels along the DNA together. That whole molecular machine is the [[replisome|replisome]] — picture a mobile factory parked at each fork, unwinding, priming, copying, and proofreading in one tightly choreographed motion. In bacteria it moves at hundreds to over a thousand bases per second. Its parts are conserved in broad outline from bacteria to humans, even though the specific proteins differ.

It helps to meet the crew by what each one does, before we walk through them in order. At the very tip rides the unwinder. Right behind it, proteins coat the freshly bared strands so they cannot snap shut. Out ahead of the fork, another enzyme keeps relieving the twisting strain. And down at the junction, one enzyme writes a starter and another extends it into the real new strand. Five jobs, five kinds of machine. Let us take them one at a time.

Opening the helix, and holding it open

The very first job at a fork is to peel the two strands apart, and the enzyme that does it is [[dna-helicase|DNA helicase]]. It is typically a ring-shaped protein, often six subunits, that encircles one of the two strands. Powered by the cell's ATP energy currency, it pulls itself along that strand and, like a moving wedge, forces the two strands apart as it goes — breaking the hydrogen bonds between the paired bases. It rides at the very tip of the fork and leads the way, so the polymerases behind it always have single-stranded template to read. (One careful distinction: helicase only breaks the base-pair hydrogen bonds; it never cuts the sugar-phosphate backbone.)

But freshly separated strands behave like a strip of Velcro pulled apart: the two halves want to snap right back together. Left alone, the unzipped single strands would re-pair or fold back and pair with themselves into hairpins. So the moment helicase opens the helix, many copies of [[molbio-single-strand-binding-protein|single-strand binding protein]] (SSB in bacteria; the eukaryotic equivalent is RPA) coat the bared strands. They bind any single-stranded DNA regardless of its sequence — their job is purely structural — keeping the template open, straight, and untangled until the polymerase reaches it, and protecting it from enzymes that would chew up loose single-stranded DNA.

There is a hidden mechanical cost to all this unwinding. Try to pull apart the two strands of a tightly twisted rope from the middle: the rope *ahead* of your hands winds up tighter and tighter until you can hardly pull further. DNA has exactly this problem — as helicase opens the helix, the still-paired DNA ahead of the fork overtwists into supercoils. The enzymes that relieve this strain are the [[topoisomerase-gyrase|topoisomerases]] (in bacteria, a type called DNA gyrase). They work by transiently cutting the backbone — one strand or both — letting the DNA rotate or pass through the break to release the tension, then sealing it perfectly back up. Without them the fork would grind to a halt within seconds.

Writing the new strand: primer first, then polymerase

Now the strands are open and held apart — but the central enzyme of copying has a strange limitation. [[dna-polymerase-replicative|DNA polymerase]], the worker that builds the new strand, can only *add to* an existing chain; it cannot start one from scratch. It is like a printer that can keep adding pages to a stack but cannot lay down the very first page. So something else must write the first few letters. That is the job of [[primase-and-rna-primer|primase]], a special RNA polymerase that *can* start on a bare template. It lays down a short stretch of RNA — typically about ten nucleotides — complementary to the DNA, and that little RNA primer gives DNA polymerase the free 3' end it needs to begin.

fork tip --> [helicase] unzips the helix
              |
   3'-...T A C G...-5'   parent template (read 3'->5')
              |  primase lays a short RNA starter:
   5'- A U G C ...        <- RNA primer (note U, not T)
              |  DNA polymerase extends it 5'->3' in DNA:
   5'- A U G C T A G ...   primer (RNA) + new DNA strand

  topoisomerase works AHEAD; SSB coats the bare strands behind

Primase writes a short RNA starter; DNA polymerase extends it 5'-to-3'.

With a primer in place, DNA polymerase takes over, and it obeys two strict rules that quietly shape everything about replication. First, it can only add nucleotides to the 3' end of a growing strand, so the new strand always grows 5'-to-3' (and the template is read 3'-to-5'). Second, it cannot start a strand — it must extend a primer. For each step it picks the nucleotide whose base correctly pairs with the template (A across from T, G across from C), bonds it on, and moves along. Many replicative polymerases also carry a built-in proofreading function: a 3'-to-5' exonuclease that backs up, snips out a wrong base, and tries again, which is a first big reason errors stay so vanishingly rare. (Why an *RNA* primer rather than DNA? Because using RNA flags those starter sequences as temporary; later they are removed, the gaps filled with DNA, and the joins sealed — a thread picked up in the next guide.)

The crew, in order

It is worth walking the whole sequence once, in the order things happen at a single moving fork. Each step depends on the one before it, which is exactly why the replisome keeps all these machines bundled together rather than scattered.

Topoisomerase / gyrase works ahead of the fork, snipping and resealing the backbone to bleed off the twisting strain before it can stall the helicase.
Helicase, riding at the fork tip, breaks the base-pair hydrogen bonds and splits the parent helix into two single strands.
Single-strand binding proteins immediately coat the bared strands so they cannot re-pair or fold back on themselves.
Primase lays down a short RNA primer on the exposed template, giving a free 3' end to build from.
DNA polymerase extends the primer 5'-to-3', adding base-paired nucleotides and proofreading as it goes — building the faithful new strand.

One subtlety worth flagging honestly: this tidy single-file picture is a simplification. The two strands run antiparallel, so a single fork must copy them by two different schemes at once — one strand smoothly, the other in backstitched pieces — and the two polymerases are actually held together in one complex, with the awkward strand thought to loop out so both new strands can be built in the same direction as the fork moves. That asymmetry, the leading and lagging strands, is the whole subject of the next guide; here, just hold onto the idea that one machine does both jobs at once.

One origin or thousands? A question of scale

A last difference matters enormously, and it comes straight out of the prokaryote-eukaryote divide you met early on. A bacterium such as E. coli has a single circular chromosome with just *one* origin, called oriC. From it, two forks race in opposite directions around the circle and meet on the far side at a termination region — and the whole 4.6-million-base-pair genome is copied. One origin is plenty for a small loop.

A human cell is a different scale of problem entirely: about 6 billion base pairs spread across many long, linear chromosomes, and it must all be copied within the few hours of one cell-cycle phase. If a single fork tried to traverse a whole human chromosome end to end, it would take weeks. So eukaryotes light up *thousands* of origins along each chromosome, firing in waves, with bubbles growing and neighbouring forks eventually meeting and merging until the whole genome is done. Multiple origins are not a luxury — they are a necessity forced by sheer size.

Be honest about one nuance here: bacterial origins are crisp, well-defined sequences, but in many eukaryotes — mammals especially — an origin is defined less by a strict DNA sequence and more by its chromatin context, and exactly where and when each one fires is still an active research question. What *is* firmly settled is the control logic: a cell must copy its genome once and only once per cycle. Each origin is 'licensed' to fire just one time per cell cycle, and that licensing is reset only after division — a safeguard against copying any stretch twice. Mis-firing or re-firing of origins lets parts of the genome be over- or under-copied, a hallmark of the genome instability seen in cancer.