The Spliceosome & Alternative Splicing

The problem: a message interrupted

In the previous guide you watched a eukaryotic transcript get three big edits in the nucleus during pre-mRNA processing — a cap clapped onto the front, a poly-A tail added to the back, and the introns removed from the middle. This guide opens up that third edit, the most dramatic one, and asks how the cell performs it with a precision that has to be exact to a single letter. The machine that does it, the [[molbio-spliceosome|spliceosome]], turns out to be one of the strangest and most revealing devices in all of molecular biology.

Recall the layout of a eukaryotic gene from the genome rung: its coding information is not one continuous run but is broken into pieces. The keeper pieces, which survive into the final message, are exons (think *expressed*); the pieces between them, transcribed but then thrown away, are introns (think *intervening*). You can see them in the gene's exon-intron organization — exon, intron, exon, intron, and so on. The whole gene, introns and all, is copied into the pre-mRNA. So the raw transcript reads like a sentence with long stretches of gibberish wedged between the real words.

Two numbers make the scale vivid. In a typical human gene the introns are usually far longer than the exons — a gene can sprawl across tens of thousands of bases of DNA yet yield a mature mRNA only a couple of thousand bases long once the introns are gone. And the cutting has to land *exactly* between the right letters: being off by even a single base would shift the reading frame, and from that point on the ribosome would read complete nonsense. So the task is not merely "remove the introns" — it is remove each one with single-nucleotide precision, in the right order, every time.

The three marks that say where to cut

How does the machinery know where an intron begins and ends, out of tens of thousands of letters? It reads three short signposts that nearly every intron carries — the two [[splice-sites-and-branch-point|splice sites]] and the branch point. Almost every intron starts with the bases GU at its front (the 5' splice site, or donor) and ends with AG at its back (the 3' splice site, or acceptor). This is the GU-AG rule. A little way before the 3' end sits the branch point: one particular adenine (A) that will act as a hinge. Together these three marks say, in effect, "intron starts here, the pivot is here, intron ends here."

  exon 1            INTRON  ( cut me out )              exon 2
5'...---[ A G | G U ........ A ........ A G ]| C C ---...3'
              ^5' site      ^branch    ^3' site
              (GU donor)    point A    (AG acceptor)

  the spliceosome joins exon 1 --- exon 2 and frees the intron

A generic intron: it opens with GU, closes with AG, and carries a branch-point A near its 3' end. The spliceosome cuts at the two | marks and ligates the flanking exons.

These signposts are exactly why splicing is precise — and exactly why it is fragile. A single mutation that destroys a GU or an AG, or one that accidentally creates a *new* GU or AG inside an exon, can make the machinery cut in the wrong place and ruin the protein. Such splice-site mutations are a major, and often overlooked, cause of genetic disease: a base change sitting far from any protein-coding letter can still wreck a gene purely by wrecking how it is spliced. This is one honest reason the old habit of looking only at the coding letters misses real disease-causing changes.

The spliceosome: a machine made largely of RNA

Here is the surprise. You might expect the cutting to be done by protein enzymes — proteins do most of the cell's chemistry, after all. But the spliceosome is built mainly from small nuclear RNAs, short RNA molecules (named U1, U2, U4, U5, U6) each wrapped with a set of proteins into a particle called an snRNP (a "small nuclear ribonucleoprotein," said "snurp"). Several snRNPs assemble onto the intron in a set order, and crucially, it is the *RNAs* — not the proteins — that recognize the splice sites by base-pairing to them and that sit at the catalytic heart where the chemistry happens. The proteins are scaffolding and helpers; the working core is RNA.

Watch the assembly as a sequence of recognitions. U1 arrives first and pairs to the 5' GU site, marking the intron's front. U2 then pairs to the branch point, deliberately bulging the special branch A outward so its reactive arm is exposed and ready. A pre-formed trio, the U4/U6 plus U5 particle, joins to bring the two ends of the intron close together; then a dramatic rearrangement throws out U1 and U4, and U6 takes over the front while U5 holds the two exon ends in register. Only after this reshuffle is the catalytic centre — made of U6 and U2 RNA — switched on. The machine is not a static cutter; it builds itself fresh on each intron, checks the marks, and only then commits to cut.

Two cuts and a lariat: how the chemistry works

Once assembled, the spliceosome removes the intron in just two chemical steps — and the trick that makes it precise is that both steps are the *same kind* of reaction, an attack by one part of the RNA on another, swapping which atoms are bonded. Follow the two steps and the famous loop, the lariat (named after a cowboy's lasso), falls right out of the geometry.

First cut. The branch-point A has a free chemical arm (its 2'-OH) that the spliceosome has aimed straight at the intron's front. That arm attacks the 5' GU site, snapping the chain there. The freed front end of the intron does not float away — it bonds back onto the branch-point A partway down the intron, kinking the intron into a closed loop with a dangling tail. That loop-with-a-tail is the lariat.
Second cut. Snapping the front loose has left exon 1 with a free end of its own. That end now swings over and attacks the 3' AG site, the intron's tail. This second cut releases the intron completely — still in its lariat shape — and in the very same motion joins the two exons end to end, with no gap. The reading frame is preserved to the letter.
Cleanup. The released lariat is unhooked (debranched) back into a straight piece and rapidly degraded, its nucleotides recycled. The spliceosome's snRNPs come apart and are reused on the next intron. The mature mRNA, one intron shorter, moves on.

Step back and notice what just happened: the entire job was done by RNA splicing — RNA recognizing RNA and RNA catalyzing the cuts, twice over. No water was used to hack the chain blindly; instead the cell reused its bonds in a controlled swap, which is why the cuts land between exactly the right letters and never one base off. The lariat is not a mistake or leftover mess — it is the direct fingerprint of that branch-point attack, the visible proof of how the first cut was made.

Alternative splicing: one gene, many proteins

Now the payoff, and the reason this is one of the most important ideas in the whole field. Nothing forces the cell to keep every exon every time. By choosing to skip an exon, or to include an extra one, or to use an alternative splice site, the spliceosome can stitch the *same* set of exons together in *different* combinations from the *same* pre-mRNA. Each combination is a different mature mRNA, and so a different protein. This is [[molbio-alternative-splicing|alternative splicing]], and in humans it is the rule, not the exception: the large majority of our multi-exon genes are spliced in more than one way.

Picture the exons as a set of LEGO bricks numbered 1, 2, 3, 4, 5. One cell type might build the message 1-2-3-4-5; another might skip brick 3 and build 1-2-4-5; a third might keep an alternative version of brick 2. The body uses this constantly. A muscle cell and a brain cell can run the same gene yet make subtly different proteins suited to each, and a famous fruit-fly gene involved in wiring the nervous system can, in principle, be spliced into tens of thousands of distinct proteins from a single stretch of DNA. The same gene; different scissors-work; different products.

What decides which version a cell makes? Regulatory proteins bind near the splice sites and either coax a snRNP to use a site (enhancing it) or hide it (silencing it). Because these regulators differ from cell to cell and change with a cell's state, splicing becomes another layer of control over gene expression — not just *whether* a gene is read, but *which form* of its protein appears. The cell even wires splicing to its own quality control: deliberately splicing in a premature stop codon flags the message for nonsense-mediated decay, turning a gene's output down. Splicing, in short, is not mere tidying — it is a decision the cell makes.

Why this matters: the death of "one gene, one protein"

Before genomes were sequenced, biologists carried a tidy slogan: one gene, one protein. Then came a genuine shock. The Human Genome Project found that we have only about 20,000 protein-coding genes — barely more than a tiny roundworm, and fewer than some plants. People had expected a far larger count to explain a human. Alternative splicing is a big part of the resolution: a modest number of genes, each splice-able into several proteins, can specify a proteome several times larger than the gene count. The slogan is simply wrong. The honest version is one gene, *often many proteins*.

This also dissolves an older prejudice. The introns thrown away at such expense were once dismissed as wasteful, even "junk" — yet the very existence of introns is what makes alternative splicing possible, and the breaking of genes into modular exons is what lets evolution mix and match functional parts. That last idea, [[exon-shuffling|exon shuffling]], is real and powerful: because exons often correspond to compact protein domains, moving an exon from one gene into another can hand a protein a whole new working module in a single step. So a layout that looks messy and wasteful is, on closer reading, a generator of variety.