Homology: Orthologs & Paralogs

Similar is not the same as related

In the previous guide you learned to lay two sequences side by side and measure how alike they are. But likeness, on its own, is just a number — and a careless one. The word the field actually cares about is [[sequence-homology|homology]], and it carries a precise, almost legal meaning: two genes are homologous when they *descend from a common ancestral gene*. Homology is a claim about history, not about resemblance. Two sequences can look 35% alike because they truly share an ancestor, or because, with only four letters to choose from, chance dealt them a passing similarity. So a careful biologist never says two genes are '70% homologous' — homology is yes-or-no, an ancestry either shared or not. What you measure is *similarity*; what you infer from enough of it is *homology*.

Why insist on the distinction? Because homology is the bridge across which all of comparative biology walks. If a human gene and a yeast gene are homologous, then everything painstakingly learned about the yeast version — its structure, its partners, the job it does in the cell — becomes a powerful first guess about the human one. That bridge holds only because the two genes are *the same gene*, inherited from a shared ancestor that lived perhaps a billion years ago, and kept doing recognisably the same work in both lineages ever since. Mere look-alikes give you no such bridge. So homology is not pedantry; it is the licence that lets knowledge cross between species at all.

Two ways for genes to split: speciation and duplication

Homologous genes are relatives, and like any family, the relationships depend on *how* the family branched. There are exactly two events that turn one gene into two related copies, and the whole vocabulary of this guide hangs on telling them apart. The first is speciation: one ancestral population splits into two species, and the single gene it carried is inherited down both lineages. One gene, two species — that pair is a set of [[ortholog-paralog|orthologs]] (Greek *orthos*, 'straight'). The second is [[gene-duplication-divergence|gene duplication]]: within a single genome, a stretch of DNA gets accidentally copied, so now there are two side-by-side versions in the *same* organism. One genome, two copies — that pair is a set of paralogs (*para*, 'beside').

The cleanest way to feel the difference is to draw the gene's family tree. Imagine an ancestral gene G. Long ago it duplicated inside one genome into G-alpha and G-beta — now two paralogs sitting in the same organism. *Later*, that lineage split by speciation into, say, human and mouse, and both carried both copies down. The result is four present-day genes: human-G-alpha, mouse-G-alpha, human-G-beta, mouse-G-beta. Human-G-alpha and mouse-G-alpha are orthologs (their lines part at the human–mouse speciation). But human-G-alpha and human-G-beta are paralogs (their lines part at the ancient duplication, before the species even existed). Same four genes, two utterly different kinds of cousin — and only the tree tells you which is which.

                ancestral gene G
                       |
              DUPLICATION (one genome)
                /             \
            G-alpha          G-beta
               |                |
          SPECIATION       SPECIATION
           /     \          /     \
     human     mouse   human     mouse
     G-alpha   G-alpha G-beta    G-beta

  human G-alpha  vs  mouse G-alpha  ->  ORTHOLOGS  (split by speciation)
  human G-alpha  vs  human G-beta   ->  PARALOGS   (split by duplication)

The same gene tree holds both kinds of relative at once. Read where two genes' lines diverge: a speciation node makes them orthologs, a duplication node makes them paralogs.

Why the distinction decides function and history

This is not just bookkeeping — getting it wrong leads you to false conclusions. Orthologs and paralogs tend to behave very differently after they split, and the reason is selection. When a gene splits by speciation into orthologs, *both* daughter lineages still need that gene to do its one job; the same purifying selection that conserved it before keeps conserving it in each species. So orthologs usually keep the same function. But when a gene splits by duplication into paralogs, the organism suddenly has a spare. One copy can carry on the original duty while the other is released from constraint, free to drift and try things that would have been lethal in a sole copy. Paralogs therefore *often diverge in function*. This is captured in a working rule biologists call the ortholog conjecture: of all a gene's relatives, its orthologs are the safest bet to share its function — usually closer in job than even its own in-genome paralogs.

Now picture the practical trap. You sequence a new disease gene in humans and want to study it in a model animal. You BLAST it against the fly genome and grab the top hit — but the top hit by raw similarity might be a *paralog* of your gene that drifted into a different role, not its true *ortholog*. Build your experiments on that mismatch and you will study the wrong protein, then 'discover' a function that does not transfer back. The same blunder distorts evolutionary history: count a paralog pair as if it were an ortholog pair and you misplace a branch on the species tree, reading an ancient gene duplication as though it were a speciation and dating the split to the wrong epoch entirely. The relationship type is load-bearing; mistake it and both the biology and the history above it collapse.

Paralogs: how evolution drafts new genes

Paralogs deserve a closer look, because they are nothing less than evolution's chief workshop for novelty. A new gene almost never appears from blank DNA; far more often it is a copy of an old gene, repurposed. The classic story plays out in three acts. First, duplication hands the genome a redundant spare. Second, the spare is free from selection — and most of the time this freedom is fatal: mutations accumulate unchecked until the copy can no longer make a working protein, and it decays into a [[gene-families-and-pseudogenes|pseudogene]], a silent fossil still legible in the sequence but no longer expressed. Third — rarely, preciously — a freed copy stumbles into a useful new twist before it dies, selection seizes on the improvement, and a genuinely new gene is born alongside the old.

Repeat that cycle over deep time and one ancestral gene blossoms into a gene family — a whole troupe of paralogs descended from a single original. The textbook example is the globins. A lone ancestral oxygen-binding gene duplicated again and again, and its paralogs specialised: an embryonic form, a foetal form tuned to pull oxygen across the placenta, an adult form, and myoglobin stashing oxygen in muscle. They are obviously kin — line up their sequences and the shared ancestry is unmistakable — yet each does a subtly different job, each the product of a spare copy that was once free to experiment. Read the similarity pattern within a gene family and you are reading, almost directly, the order in which its duplications happened: the deepest splits separate the most-diverged members, the shallowest separate the newest twins.

Yeast to human: homology as a working tool

Now the payoff that makes this vocabulary worth its weight. Recall from early in this ladder why we lean on [[molbio-model-organism|model organisms]] — yeast, the fly, the worm, the mouse — instead of studying everything in humans directly: they are cheap, fast, and ethically far simpler. Homology is the reason that strategy works at all. The core machinery of a cell — how DNA is copied, how the cell cycle is timed, how proteins are folded and shipped — was largely worked out in our shared ancestors and has been conserved ever since. So the genes running those processes in yeast have human orthologs doing the very same jobs. Study the gene in yeast, where you can mutate it freely and watch the cell respond, and you have a detailed working model of its human ortholog you could never have built in a human cell.

How sure is the bridge? Sometimes startlingly sure. Researchers have taken human genes and dropped them into yeast cells whose own ortholog had been deleted — and in hundreds of cases the human gene steps in and rescues the yeast, doing the job well enough to keep the cell alive across roughly a billion years of separation. That is homology made tangible: not a faint family resemblance but a part still interchangeable across the tree of life. It is exactly how cancer biology grew out of yeast cell-cycle genes, and how the machinery of countless human diseases was first dissected in organisms with no diseases of their own.

Two honest cautions keep this tool from being abused. First, conservation of *sequence* implies conservation of *function* only by likelihood, not by law — orthologs occasionally pick up new roles, and a single duplication that left one species with paralogs can quietly scramble a tidy one-to-one match. Second, the simple picture assumes genes are passed strictly down from parent to offspring. In bacteria and archaea that assumption breaks: through [[molbio-horizontal-gene-transfer|horizontal gene transfer]], genes leap sideways between unrelated microbes, so a gene's history can diverge sharply from its host's. So homology is a powerful, well-justified first guess — the launchpad for an experiment, never its substitute. Treated that way, the simple act of recognising that two genes share an ancestor remains one of the most quietly powerful ideas in all of biology, the thread that ties every genome into one continuous story.