The Central Dogma

One diagram to hold the whole field

In the previous guide you met the cast of molecular biology: DNA, RNA, and protein, plus the chemistry that lets them stick together. Now we put them in motion. The single most useful idea in the field — the one you will reach for again and again — is the [[molbio-central-dogma|central dogma]]: a claim about which direction *information* travels between these molecules. It is not a law of physics; it is a beautifully compact summary of what cells actually do, and learning to read it is like getting the map before the hike.

Here is the famous core, in plain text: DNA -> RNA -> protein. DNA copies itself (replication), DNA is read out into RNA (transcription), and RNA is read out into protein (translation). That is the everyday traffic of information flow inside a living cell. Three molecules, three arrows. Almost everything else in this ladder is a detailed answer to *how* one of those arrows works, *when* a cell decides to send a message down it, and *what happens* when the machinery slips.

        replication
          (loop)
            |
  DNA  --transcription-->  RNA  --translation-->  PROTEIN
   ^
   |__ reverse transcription (from RNA, by special enzymes)

The everyday flow runs left to right; replication loops DNA back to DNA; reverse transcription is the famous exception that runs RNA -> DNA.

What each arrow actually does

Take the arrows one at a time, because each is a real, physical copying job done by molecular machines. Replication is how DNA makes a second DNA before a cell divides. Recall the antiparallel double helix from the last guide — two strands running opposite ways, A always reaching across to pair with T, G with C. The cell unzips the two strands and uses each as a template to build a fresh partner, so each daughter cell inherits one old strand and one new one. That elegant half-old, half-new pattern is called semiconservative replication, and it is *why* the base-pairing rules matter so much: the sequence on one strand dictates the sequence of the other.

[[transcription-overview|Transcription]] copies a stretch of DNA — a gene — into a short, single-stranded working copy made of RNA. Think of the DNA as the master archive locked in the library, and the RNA as a disposable photocopy you can carry to the workbench. The same base-pairing logic guides it (with RNA using U in place of T), and an enzyme called RNA polymerase does the copying. Because the cell makes a fresh RNA copy only when it needs that gene's product, transcription is the main place where cells *decide* what to switch on — a control point we revisit on almost every later rung.

[[molbio-translation-initiation|Translation]] reads the RNA message and builds a protein. This is where the famous genetic code enters: the RNA is read three letters at a time, and each three-letter codon specifies one amino acid to add to the growing protein chain. A reading like 5'-AUG GCU UUU-3' means start, then alanine, then phenylalanine. The molecule that physically matches code to amino acid is the small adapter transfer RNA (tRNA), and the whole assembly happens on a ribosome. So the arrows are not metaphors — they are three different copying machines, each enforcing a clear set of pairing rules.

What Crick really said (and the famous misreading)

Here is the most important honesty in this whole guide. Francis Crick stated the central dogma in 1958, and people have misquoted it ever since. The popular version — "information only ever flows DNA -> RNA -> protein and can *never* go backward" — is not what Crick meant. He chose the dramatic word "dogma" partly as a wry joke, and he was careful about what the claim actually covers.

What Crick *actually* claimed is narrower and more precise: once information has passed *into* protein, it can never get back *out* of protein into nucleic acid. In his framing, sequence information can move freely among the nucleic acids and from nucleic acid to protein, but the one direction nature forbids is protein -> nucleic acid. A protein's amino-acid order can never be used as a template to write a new RNA or DNA sequence. That — and only that — is the line the dogma draws.

Why does this distinction matter so much beyond pedantry? Because reverse transcription is not a rare footnote — it is how retroviruses copy themselves into our genomes, how a key lab technique (RT-PCR, the engine behind many diagnostic tests) reads RNA, and how parts of our own genome accumulated over evolution. Treating "backward = impossible" as the dogma would blind you to a whole layer of biology. The honest version keeps that layer in view while still capturing the deep truth: protein is the end of the line, never the source.

Following one message from gene to protein

Let us walk a single message down the everyday arrows, the way a working cell does it. This whole journey is called gene expression — taking the quiet information in a gene and turning it into a working molecule. Keep the photocopy analogy in mind: nothing is ever written back onto the master archive.

A cell needs a particular protein, so it locates that gene on its DNA — a defined stretch of the sequence — and opens the double helix there.
RNA polymerase reads one DNA strand and transcribes a matching RNA copy of the gene; this messenger RNA carries the order, written in the same A/U/G/C language, out to where proteins are built.
A ribosome clamps onto the RNA and reads it in three-letter codons, starting at the start codon AUG.
For each codon, a matching tRNA delivers the correct amino acid, and the ribosome links amino acids into a chain — the protein, built letter-faithful to the gene.
The finished chain folds into a working shape and does its job — and that is the end of the line: its sequence is never copied back into RNA or DNA.

One honest caveat before you internalize "one gene -> one protein": that tidy slogan is outdated. In our cells a single gene's RNA copy can be cut and re-joined in different ways — alternative splicing — so one gene can yield several distinct proteins. The arrows of the dogma still hold perfectly; the dogma is about *direction* of information, not about a one-to-one count. We will unpack splicing properly a few rungs up.

Why this one diagram frames everything above

Once the arrows live in your head, the rest of the ladder organizes itself around them. Each later rung is essentially a magnifying glass held over one part of the diagram: the replication rung zooms into the DNA -> DNA loop and its remarkable accuracy; the transcription and RNA rungs zoom into DNA -> RNA and everything cells do to the message afterward; the translation and protein-folding rungs zoom into RNA -> protein and what the finished chain becomes. Gene regulation asks *which* arrows fire and *when*. You are not learning a pile of unrelated facts — you are filling in one shared map.

The diagram also tells you where things can go wrong, and where biology gets interesting. A typo in the DNA — a mutation — may change a codon and so change one amino acid; most such changes are harmless or silent, a few are useful, and a few cause disease. Some RNAs never become protein at all but do real work as RNA, hinting at an ancient RNA world when RNA may have run the show before DNA and protein existed. And rarely, a protein's *shape* can corrupt other copies of the same protein — a prion — which feels like protein passing information on, yet still never edits a gene, so the dogma's true boundary holds.