Transcriptomics & Single-Cell

Same genome, different cell

Here is a fact that should feel slightly impossible. The neuron firing in your brain and the cell filtering toxins in your liver carry the *exact same* genome — letter for letter, all three billion bases, the same in nearly every cell you own. Sequencing the genome, the project of the previous guides, reads that shared book once and is essentially done. Yet a neuron and a liver cell could hardly look or behave more differently. The genome alone cannot explain that difference, because the genome is identical. What differs is the transcriptome: the set of RNA molecules a cell is actively making at a given moment — in other words, which genes it has switched *on*, and how loudly.

You already know the machinery behind this from earlier rungs: gene expression turns a gene into RNA and then, usually, into protein, and you spent whole tracks learning how transcription is controlled by transcription factors, enhancers, and chromatin. A liver cell runs the albumin and detox-enzyme genes hard while keeping the neuron-specific genes silent; a neuron does the reverse. The genome is the full *catalogue* of what a cell *could* make; the transcriptome is the *order slip* of what it is making right now. Genomics gave us the catalogue. This guide is about reading the order slips.

RNA-seq: counting every transcript at once

So how do you read RNA with a machine built to read DNA? The trick is a beautiful re-use of an enzyme you already met. RNA-seq starts by harvesting all the RNA from a sample, then copies it back into DNA using reverse transcriptase — the enzyme that runs the central dogma *backwards*, RNA -> DNA, the very one whose existence proved the central dogma never forbade information flowing the other way. That DNA copy of the RNA, called cDNA, is then fed into the same next-generation sequencer from the last guide. Every transcript in the cell becomes reads on the machine.

Here is the conceptual leap that makes RNA-seq more than just 'sequencing RNA'. When you sequenced a genome, every position appeared in your reads about the same number of times — you read each letter once because it is there once. But in RNA-seq, a gene that is being transcribed *hard* makes thousands of mRNA copies, while a gene barely on makes a handful, and a silent gene makes none. So the *number of reads* landing on a gene is a direct measurement of how strongly that gene is expressed. You are no longer just reading sequence — you are *counting*. Line up the read counts gene by gene and you get a quantitative portrait of exactly what the cell is doing.

This makes RNA-seq the natural tool for the question biologists ask most: *what changed?* Treat one batch of cells with a drug and leave another untreated, sequence both transcriptomes, and the genes whose read counts jumped or collapsed are the genes that responded. Because RNA-seq reads the actual transcripts rather than guessing, it also catches things a gene list cannot — it sees alternative splicing, where one gene yields different mRNAs in different cells, and it picks up RNA from regions no one had annotated as genes. The older method, the DNA microarray, could only measure transcripts you already knew to print on the chip; RNA-seq listens to everything that is there.

The trap of the averaged smoothie

There is a quiet lie hidden in ordinary RNA-seq, and naming it is the key to understanding why single-cell came next. To get enough RNA for the early machines, you ground up a whole *tissue* — millions of cells — and sequenced the pooled RNA together. But a piece of tissue is never one kind of cell. A sliver of tumour contains cancer cells, immune cells, blood-vessel cells, and connective tissue, all blended. Bulk RNA-seq throws them in a blender and reports the average. And an average can describe a population that contains no actual member: it is the statistical equivalent of a family with 1.8 children.

Picture two tissues. In the first, every cell expresses a gene at a medium level. In the second, half the cells blast that gene at maximum and half keep it dead silent. Bulk RNA-seq reports the *same medium average* for both — yet biologically they could not be more different. You have smoothed a vivid mosaic into a flat grey. That blindness matters most exactly where it hurts: the few drug-resistant cells hiding in a tumour, the one rare stem cell in a tissue, the handful of cells beginning to change first. The blender erases them.

Single-cell sequencing: one cell at a time

The fix is exactly what the name says: stop blending, and read one cell at a time. Single-cell sequencing (most often single-cell RNA-seq) gently breaks a tissue into a suspension of individual cells, then isolates each cell on its own before sequencing. The cleverest trick is the barcode. Before the cells are pooled back together for sequencing, every transcript from a given cell gets tagged with a short DNA 'barcode' unique to that cell. Now you can sequence millions of transcripts together in one efficient run, and afterwards sort the reads back to their cell of origin by reading the barcode — like stamping each guest's hand at the door so you can still tell who said what after everyone has mingled.

Dissociate the tissue into a soup of single, separated cells.
Trap each cell alone — classically inside its own tiny oil droplet — together with one bead carrying millions of copies of a single cell-specific barcode.
Inside each droplet, copy that cell's RNA into cDNA and stamp every copy with the droplet's barcode, so the cell-of-origin is written into the molecule itself.
Pool everything, sequence it all in one big run, then use software to split the reads by barcode — rebuilding a separate expression profile for each of thousands of cells.

The payoff is a different kind of picture entirely. Instead of one averaged profile for the tissue, you get thousands of individual profiles, and you can let the computer group cells by how similar their expression is. Cells that switch on the same genes cluster together, and each cluster turns out to be a real cell type — here the T cells, there the liver cells, over there a rare population nobody had named. Tissues that bulk methods painted as uniform dissolve into rich atlases of dozens of distinct states. This is why single-cell sequencing reshaped immunology, cancer biology, and developmental biology in barely a decade: it let us see the individuals inside the crowd.

Beyond RNA: the other -omics

Once you have the habit of measuring a whole *layer* of biology at once, the suffix '-omics' starts spreading. The genome gives genomics, the transcriptome gives transcriptomics — and the next layer down is the proteome, the full set of proteins a cell actually contains. This matters because the transcriptome is still only a forecast: an mRNA is an *order placed*, not a *protein delivered*. Translation rates differ, proteins get modified and degraded at their own pace, so the amount of an mRNA and the amount of its protein are correlated but far from identical. To know what proteins are really present, you must measure them directly — and that is proteomics.

Proteins, though, are not made of four repeating letters, so you cannot 'sequence' a proteome the way you sequence DNA. The workhorse instead is the *mass spectrometer*: it shatters proteins into peptide fragments, then weighs those fragments with exquisite precision. Because every amino acid has a known mass, the pattern of fragment weights acts like a fingerprint, and software matches each fingerprint back to the protein it came from — and even reveals the chemical tags, like phosphate groups, that switch proteins on and off. It is a fundamentally different machine from a sequencer, which is why proteomics has always lagged genomics in coverage and ease.

LAYER          MEASURES                         MAIN TOOL
-----------    ------------------------------    --------------------
genome      -> what a cell COULD do (DNA)         DNA sequencing
transcriptome -> what it is SAYING (RNA)          RNA-seq
proteome    -> what it actually BUILT (protein)   mass spectrometry
metabolome  -> the small molecules it MADE        mass spec / NMR

DNA --transcribed--> RNA --translated--> protein --acts on--> metabolites
(each downstream layer is closer to phenotype, and harder to measure)

The -omics layers follow the central dogma downstream: each step moves closer to what the cell actually does, but gets harder to measure completely. No single layer tells the whole story.

The list keeps going: the metabolome is the cell's full inventory of small molecules — sugars, lipids, the products of metabolism — and the epigenome maps the chemical marks on DNA and histones, both gathered under metabolomics and epigenomics. No single layer is the truth; each is one slice. The real power comes from stacking them and asking how they fit together, which is why the natural sequel to all this measuring is systems biology — and turning these torrents of numbers into actual biological insight is, once again, the daily work of bioinformatics, the discipline the previous guide promised would only grow.