The Reference Genome, Annotation, and Finding the Differences

The reference genome: a shared map, not anyone's exact DNA

A reference genome is a single, high-quality, agreed-upon sequence that the whole field uses as a common coordinate system. It is not any one living person's DNA — it is a curated composite, stitched from several donors and continuously corrected. Its value is that everyone can describe a finding the same way: “chromosome 7, position 117,559,590,” for example, means the same thing in every lab.

Once you have a reference, you can practice comparative genomics: lining up genomes to see what is conserved and what differs. Comparing humans to mice, or one person to another, both rely on having a shared map to compare against. But a raw map of three billion letters is not yet useful — first you have to label what the letters *mean*.

Annotation: finding the meaning in the letters

Genome annotation is the work of marking up the raw sequence: where genes start and stop, which stretches are exons (kept in the final message) versus introns (spliced out), where regulatory switches sit, and which regions don't code for anything. Annotation is what turns a featureless string of A, C, G, T into a labelled map you can navigate.

Now the payoff. Any two unrelated humans are about 99.9% identical in sequence. The differences — the 0.1% — are what make us individuals. By far the most common kind is the single-nucleotide polymorphism (SNP, said “snip”): a single position where the letter varies between people. Where most people carry an A, you might carry a G. A genome carries millions of SNPs.

Reference:  ... G A T T A C A G G C ...
Person A:   ... G A T T A C A G G C ...   (matches)
Person B:   ... G A T T G C A G G C ...   (T -> G)
                        ^
          A SNP: one position, two common versions

Neighbouring SNPs travel together as a block:
  haplotype = [G .. G .. T .. A] inherited as a unit

A SNP is a single-letter difference shared across a population; nearby SNPs are inherited together as a haplotype.