The bus that ran out of road
Picture the inside of a desktop PC from around 2000. Connecting the processor to its peripherals was a thing called the PCI bus — a literal ribbon of 32 parallel wires, each carrying one bit of a 32-bit word at the same instant. It was the obvious way to move data: if you want to send 32 bits, use 32 wires and send them all at once. Intuitive, simple, and for a long time, fast enough.
The trouble is what happens when you push the clock faster. Each of those 32 wires has slightly different length, slightly different capacitance, slightly different neighbours coupling into it. At a leisurely 33 MHz nobody cares — a bit takes 30 nanoseconds, and a few hundred picoseconds of mismatch between wires is a rounding error. But crank the rate up, and that same mismatch becomes a catastrophe. The bits stop arriving together. The word arrives smeared in time.
And skew is only the first bill. A 32-bit bus needs 32 pins on every chip it touches — plus ground returns, plus a clock, plus control. Pins are expensive real estate on a package, and a board has to physically route every one of those wires, keeping them all the same length so they stay in sync. Every wire also switches a little current, and all that switching adds up to real power. Width, it turns out, is not free.
The trade: width for speed
Here is the move that changed everything. If wide-and-slow is hurting you, do the opposite: go narrow-and-fast. Instead of 32 wires at 33 MHz, use *one* wire running 32 times faster. The total throughput is the same, but now there is only one wire — so there is no skew *between* wires, because there is only one. The whole problem of keeping a herd of signals aligned simply vanishes.
Of course, you can't just feed a parallel word onto a single wire. You need a gadget that takes the 32 bits sitting side-by-side and lines them up into a single stream, sending them out one after another at very high speed. That gadget is the serializer. At the far end you need its mirror image — something that catches the racing stream and reassembles it back into a 32-bit word. That is the deserializer. Put the two together and you get the unit that names this whole field: SerDes (SERializer / DESerializer).
PARALLEL IN SERIAL LINK PARALLEL OUT
(32 bits, slow) (1 lane, 32x faster) (32 bits, slow)
b0 ──┐ ┌── b0
b1 ──┤ ├── b1
b2 ──┤ ┌───────────┐ ───►───►───►─── ┌──────────┐ ├── b2
. ├──►│ SERIALIZER│═══════════════════════│DESERIALIZ│─┤ .
. ├──►│ (funnel) │ 1 0 1 1 0 0 1 0 ► │ (fan) │─┤ .
b30 ─┤ └───────────┘ one bit at a time └──────────┘ ├── b30
b31 ─┘ ▲ ▲ └── b31
│ │
tx clock recovered clockTwo wires, not one: the differential pair
There is a small lie in the diagram above. A real high-speed serial link almost never uses a single wire — it uses *two*, twisted into what's called a differential pair. One wire carries the signal; its partner carries the exact opposite. When you want to send a 1, you push one wire up and the other down by the same amount; for a 0, you swap them. The receiver doesn't look at either wire's voltage against ground — it looks only at the *difference* between the two.
Why bother with two wires when the whole point was to use fewer? Because of noise. Any interference — a switching power supply, a nearby clock, a burst from a motor — couples into *both* wires of a tightly-routed pair almost equally. When the receiver subtracts one from the other, that common noise cancels out, while the signal (which is opposite on the two wires) doubles. A differential pair is, in effect, a wire that carries its own noise-reference along with it.
single-ended (1 wire vs ground) differential (2 wires, look at the gap)
V ┐ ┌───┐ ┌── P ───┐ ┌───┐ ┌── (TX+)
│ │ │ │ N │ │ │ │
└───┘ └───┘ ───┘ └───┘ └── (TX-, the mirror)
noise rides in the gap (P − N) is what the RX reads;
on top of the signal noise that hits BOTH cancels outWhere did the clock go?
The old parallel bus shipped a clock signal alongside the data — a separate wire that ticked, telling the receiver 'sample now.' That worked because the data wires and the clock wire were the same length, so they stayed aligned. But at multi-gigabit speeds, a clock wire would suffer exactly the skew problem we just escaped: by the time the clock edge arrived, it would no longer line up with the data it was supposed to time.
So modern SerDes does something clever: it throws the clock wire away and hides the clock *inside* the data. The transmitter ensures the stream has enough 0-to-1 and 1-to-0 transitions (using coding tricks we'll meet later), and the receiver watches those transitions to figure out the exact rhythm of the incoming bits — then regenerates its own clock locked to that rhythm. This is clock-and-data recovery, and it's the reason the diagram earlier showed a *recovered* clock at the receiver, conjured from the data itself.
- Transmit: the serializer clocks bits out one at a time, fast, onto the differential pair — and deliberately keeps the data 'busy' with transitions so a clock is recoverable.
- Travel: the bits race across the channel — a board trace, a connector, maybe a cable — getting attenuated and smeared along the way.
- Recover: the receiver studies the arriving transitions, locks a local clock to their rhythm, and uses that clock to decide each bit.
- Reassemble: the deserializer collects the recovered bits and fans them back out into the original parallel word.
The price of going fast: a preview of the obstacles
Trading width for speed solves skew and pins — but it hands you a new bill, payable at the receiver. When a bit period shrinks from 30 nanoseconds (the old PCI bus) to under 30 *pico*seconds (a modern 56 Gbps lane), the copper itself stops cooperating. A board trace that looked like a perfect wire at low speed now behaves like a lossy, blurring filter. The crisp square pulses you launched arrive rounded, slumped, and overlapping into their neighbours.
That is the agenda for the rest of this track, and each obstacle gets its own rung. The channel robs the signal of energy and smears it. Equalization is the makeup that un-smears it — circuits that boost the frequencies the channel killed, both before launching (at the transmitter) and after arriving (at the receiver). Clock recovery extracts the hidden timing. And the eye diagram is the X-ray we use to see, at a glance, whether any of it worked — whether there is still a clean open 'eye' in the middle of all that blur where the receiver can safely decide a 1 from a 0.
It's worth pausing on just how astonishing this is. Inside the serializer and deserializer, switching transistors flip a differential pair billions of times per second, the resulting wave survives a journey through imperfect copper that mangles it almost beyond recognition, and yet the receiver — using nothing but the data's own transitions for timing — rebuilds the original word with an error rate of perhaps one bit in a quadrillion. Every link you use, from the USB cable on your desk to the fabric inside a data-centre switch, is quietly performing this feat right now.