Squeezing More Bits: PAM4 and Link Training

The wall at the end of the runway

Imagine you are throwing tennis balls down a long, narrow corridor to a friend. If you throw slowly, each ball arrives clean and your friend catches it easily. Throw faster, and the corridor starts to matter — balls clip the walls, leave little echoes, and start to merge into one another. Eventually you are throwing so fast that your friend can no longer tell where one ball ends and the next begins. A copper trace on a printed circuit board is exactly that corridor, and for thirty years the entire industry's strategy for going faster was simply to throw the balls quicker — to double the baud rate, the number of symbols sent per second.

That strategy worked beautifully — until physics sent the bill. A copper channel does not attenuate all frequencies equally. High frequencies, which is to say sharp, fast edges, get hammered far harder than slow ones, thanks to the skin effect and dielectric loss. The faster you signal, the more energy lives at those punished high frequencies. Worse, the loss does not grow linearly — it grows roughly with the square root of frequency times distance, so doubling the rate often more than doubles the loss in decibels. By the time you reach 56 GBaud, a realistic backplane channel can swallow 30 to 40 dB of insertion loss. That is a factor of a thousand or more in voltage. The signal arriving at the receiver is a smeared, ghostly shadow.

Two bits in one pulse: the PAM4 idea

If you cannot send symbols faster, send *more information per symbol*. That is the whole idea of PAM4 — Pulse Amplitude Modulation with 4 levels. Classic serial links are binary: the wire is either high (a 1) or low (a 0). That is two levels, called NRZ (non-return-to-zero), and each symbol carries exactly one bit. PAM4 instead defines four distinct voltage levels. With four levels you can encode two bits per symbol — the pairs 00, 01, 11, 10 — so you carry twice the data at the same symbol rate. A 56 GBaud PAM4 link moves 112 gigabits per second of data while its edges stay at the gentler 28 GHz-class frequency content of a 56 GBaud signal. You sidestep the loss wall not by fighting the channel, but by asking less of it.

  NRZ (1 bit/symbol)              PAM4 (2 bits/symbol)

  V                               V
  hi --___    ___    ___          L3 --___          ___   = 11
        |    |   |    |           L2 --   \___   ___/      = 10
  lo ---'    '---'   '---         L1 --       \_/          = 01
       1   0   1    0             L0 --                   = 00

  one threshold, ONE eye          three thresholds, THREE eyes
  full swing per decision         swing split into thirds

NRZ slices the swing once; PAM4 slices it into three, stacking three small eyes where NRZ had one big one.

There is no free lunch, of course, and the bill arrives in the eye diagram. An NRZ signal has one decision threshold sitting in the middle of the swing, and so it draws one eye — one open region where the receiver samples to decide 1 versus 0. PAM4 must distinguish four levels, which means three thresholds and three vertically stacked eyes. Crucially, all three eyes share the *same total voltage swing* as the single NRZ eye. So each PAM4 eye is only about one-third as tall. In one stroke you have given away roughly 9.5 dB of vertical margin — that is 20·log₁₀(3) — before a single noise electron has shown up.

New ways to be wrong

Squeezing the eyes does more than shrink margin — it invents whole new error mechanisms that NRZ never had to worry about. The first is plain arithmetic: with three eyes you have three times as many edges where the signal can slip across a threshold, so for a given amount of noise the raw symbol error rate is far higher. PAM4 systems routinely operate at a pre-correction bit error rate (BER) around 1e-4 to 1e-6 — astronomically worse than the 1e-12 to 1e-15 a healthy NRZ link delivers — and lean on strong forward error correction (FEC, like Reed–Solomon RS(544,514)) to claw the effective BER back down to 1e-15 at the application layer. Without FEC, PAM4 simply does not close.

The second mechanism is subtler and uniquely PAM4: level mismatch and nonlinearity. The four levels must be evenly spaced for the three eyes to be equally tall, but real transmitter drivers are not perfectly linear — the gaps between levels can come out unequal. Engineers track this with a metric called RLM (Ratio of Level Mismatch), where 1.0 is perfect and the standards demand something like ≥0.95. A driver that compresses the top of its range squeezes the upper eye shut while the lower eyes still look fine, and a measurement that only checks one eye will miss it.

Why the link has to train itself

Back in rung 4 you met the equalizer family: the feed-forward equalizer (FFE) and TX pre-emphasis that pre-distort the transmitted signal to cancel the channel's smearing, the continuous-time linear equalizer (CTLE) that boosts the punished high frequencies at the receiver, and the decision-feedback equalizer (DFE) that subtracts the trailing echoes of past symbols. Each of these has *tap settings* — knobs that say how hard to boost, how much to pre-distort, how much of each past symbol to subtract.

Here is the problem: there is no single correct setting. The ideal tap values depend entirely on *this specific channel* — the exact trace length, the connectors, the vias, the package, the cable, even the temperature. A motherboard with a CPU two inches from its SerDes partner needs gentle equalization; a 30-inch backplane through three connectors needs the equalizers cranked to the limit. The same chip ships into both. No factory setting can be right for every channel, and the channel is not even known until the system is built. So the link must measure its own channel at startup and tune itself — that is link training and adaptation.

Anatomy of a training handshake

When two PAM4 endpoints power up — say a PCIe 6.0 root complex and an endpoint, or two 112G Ethernet ports across a backplane — they do not begin by sending data. They run a choreographed startup sequence. First they establish a crude, robust link so the two ends can exchange control messages at all; this often means dropping to a slow, well-understood mode (PCIe trains up rung by rung; Ethernet's IEEE 802.3 Clause 72/136/162 defines a dedicated training frame). Then the real work begins: the receiver evaluates its incoming eye and sends coefficient-adjustment requests back to the far transmitter, walking the FFE/pre-emphasis taps toward the value that opens all three eyes widest.

Acquire lock. The receiver's clock-and-data recovery (CDR) locks onto the incoming edges, and a known training pattern (often a PRBS sequence) lets the receiver build a picture of the channel's response.
Adapt the receiver's own knobs. The CTLE peaking and the DFE taps are auto-tuned locally — typically by an LMS (least-mean-squares) loop that nudges each tap to minimize the error between the sampled level and its ideal target.
Negotiate the far transmitter. Over the back-channel, the receiver requests increment/decrement of the far FFE / pre-emphasis coefficients (pre-cursor, main cursor, post-cursor) and watches its own eye improve after each step.
Converge and figure-of-merit. Both ends iterate until a quality metric — eye height/width, or estimated BER — stops improving. The settings are frozen.
Go to data mode. The link exits training, the FEC engine turns on, and real traffic flows. Many links then keep the receiver's adaptation running *quietly in the background* so it can track slow drift from temperature and voltage.

  56 GBaud PAM4  (112 Gb/s)  -- a worked picture
  -----------------------------------------------
  symbol rate        : 56 GBaud      (Tsym ~ 17.9 ps)
  bits per symbol    : 2  -> 112 Gb/s raw line rate
  channel loss @ Nyq : ~30 dB  (Nyquist = 28 GHz)
  three eyes, each   : ~1/3 of swing  -> ~9.5 dB SNR hit
  pre-FEC BER        : ~1e-4 ......... (looks 'broken')
  RS(544,514) FEC    :        |  corrects bursts
  post-FEC BER       : <1e-15  <-- delivered to user
  link-training time : tens of ms, runs ONCE at startup

The headline numbers of a 112G PAM4 link — and why a 1e-4 raw error rate still ships a flawless connection.

Seeing it on the scope

If you ever get to probe a live PAM4 link on a high-end real-time scope, the picture is unforgettable. Where an NRZ link shows one crisp open eye, PAM4 shows three small eyes stacked like a tiny traffic light — and before training they may all be smeared shut, a vertical wash of noise with no openings at all. Then you watch the receiver's adaptation kick in: the eye diagram *blooms* open from the inside, the three apertures emerging one by one as the equalizers cancel the channel's smearing. It is the single most vivid demonstration in all of high-speed I/O that equalization is not optional decoration — it is the only reason the link works at all.

It is worth pausing on what this means for how we now think about a link. A 1990s serial link was a *signal* problem: design a clean driver, route a controlled-impedance trace, and the bits arrive. A modern 112G PAM4 link is a *system* problem with a feedback loop baked in. The transmitter, the channel, the receiver, the adaptation algorithm, and the FEC engine are co-designed; none of them works in isolation. The link is less a wire and more a tiny, self-calibrating communications system that happens to live inside a connector — a direct descendant of the same ideas of channel capacity and equalization that govern your phone's modem and a deep-space radio link.