One netlist, a million different chips
Imagine a bakery that follows one recipe perfectly, yet every loaf comes out a little different — the oven runs hot near the back, the flour's moisture drifts with the weather, the kitchen warms up through the afternoon. The recipe is fixed; the conditions are not. A modern fab is exactly this kind of bakery, only the loaves are billions of transistors and the tolerances are measured in atoms. Lithography never prints two gates identically; dopant atoms land by the laws of chance; the gate oxide is a few atomic layers thick and varies wafer to wafer. The chip you simulated at one nominal operating point is a fiction. What actually ships is a probability cloud.
Three knobs dominate how fast a finished transistor switches, and engineers bundle them into one acronym: PVT — Process, Voltage, Temperature. *Process* is how the silicon came out of the fab: were the transistors etched a touch fast or a touch slow? *Voltage* is the supply rail the chip happens to see — nominally 0.8 V on a leading node, but it droops under load and sags as a battery drains. *Temperature* is the junction temperature, anywhere from −40 °C in an automotive cold-start to 125 °C in a thermally throttled SoC. Each axis shifts delay, and they do not move independently — the same chip can be slow on one mode and fast on another.
Process corners: bracketing the silicon lottery
You cannot characterize every possible chip — there are too many. Instead the foundry hands you a handful of pre-built process corners: extreme but physically plausible combinations of how the NMOS and PMOS transistors turned out. The classic five are named by the speed of each transistor type. TT (typical-typical) is the nominal recipe. FF (fast-fast) is the lucky wafer where both N and P came out fast — high drive, low delay. SS (slow-slow) is the unlucky wafer — weak drive, long delay. SF and FS are the skewed corners where one type ran fast and the other slow, which matters enormously for anything that depends on the N/P balance, like ratioed logic or duty cycle.
PMOS
fast | FS · · · · · · · · FF
| \ /
| \ TT / <- TT = nominal recipe
| \ (center) / FF = both fast (fastest cells)
slow | SS · · · · · · · · SF SS = both slow (slowest cells)
+------------------------ SF/FS = N and P pull opposite ways
slow NMOS fast
Each '·' is a corner the library is characterized at.
STA must close timing at EVERY relevant one.Here is the single most useful intuition in this whole guide. Setup timing — does the data arrive before the clock edge captures it? — is hardest when logic is *slow*, so its worst case usually lives at the SS, high-temperature, low-voltage corner, where gates crawl. Hold timing — does the data stay stable just *after* the edge, rather than racing through and corrupting the same register? — is hardest when logic is *fast*, so its worst case usually lives at the FF, low-voltage corner, where a short path sprints through before the clock has settled. One chip, two opposite enemies, and you must defeat both.
The MCMM matrix: every corner times every mode
Corners answer *which silicon*. But a real chip also runs in many *modes* — and timing constraints change with the mode. A phone SoC might have a high-performance mode at 0.9 V and 3 GHz, a low-power mode at 0.6 V and 800 MHz, a sleep/retention mode, a scan-shift test mode that toggles every flop in a long chain, and a functional mode where only real logic clocks fly. Each mode has its own active clocks, its own clock periods, its own enabled paths and false-path exceptions. A path that is critical in test mode may not even exist in functional mode.
Cross the list of corners with the list of modes and you get the multi-corner multi-mode (MCMM) matrix — a grid of *scenarios*, each a specific (corner, mode) pair the chip must satisfy. Sign-off means closing both setup and hold in every scenario that is physically reachable. A 4-mode, 8-corner design can easily produce a dozen-plus active scenarios, and large SoCs run dozens. This is why timing closure is a campaign, not a single run.
CORNER (process, voltage, temperature)
SS/0.72V/-40C SS/0.72V/125C TT/0.80V/25C FF/0.88V/-40C
+--------------+---------------+--------------+---------------+
FUNC HI | setup | setup *** | (ref) | hold *** |
(3GHz) | | | | |
+--------------+---------------+--------------+---------------+
FUNC LO | setup | setup | | hold |
(0.8GHz) | | | | |
+--------------+---------------+--------------+---------------+
SCAN SHIFT | | | | hold *** |
(test) | | | | |
+--------------+---------------+--------------+---------------+
Each non-empty cell = one SIGN-OFF SCENARIO that must pass.
*** = where the worst setup / worst hold violations typically appear.
Setup hunts the SLOW corners; hold hunts the FAST corner.On-chip variation: pessimism inside a single die
Corners capture variation *between* chips — your die versus mine. But two transistors millimetres apart on the *same* die also differ: random dopant fluctuation, local lithography wobble, IR-drop gradients across the power grid, even a thermal hot-spot under a busy block. This local mismatch is on-chip variation (OCV). The cruel twist is that for a single timing check, OCV can push the *launch* path and the *capture* path in opposite directions — making the data path slow while the clock-to-the-capture-flop runs fast, the worst of both worlds for setup.
STA models OCV with timing derating: it multiplies cell and net delays by a derate factor so the launch and capture paths are pulled apart on purpose. For a setup check the tool *slows* the launch (data) path with a late-derate above 1.0 and *speeds* the capture (clock) path with an early-derate below 1.0; for a hold check it does the reverse. The numbers come from the library's variation data and the design rung — a late-derate of, say, 1.05 and an early-derate of 0.95 is a typical starting point, larger on aggressive nodes.
SETUP check with OCV derating (the pessimistic split):
launch path --[ data: DERATE LATE x1.05 -> made SLOWER ]--> D
|
common clock -+ FF
from PLL | ^
+--[ capture clk: DERATE EARLY x0.95 -> FASTER ]-+
data arrives later + clock arrives sooner = worst-case squeeze
BUT: the shaded part of the clock tree is SHARED by launch & capture.
Derating it both ways at once is DOUBLE-COUNTING. The fix:
--> Common Path Pessimism Removal (CPPR / CRPR):
credit back the over-pessimism on the common segment.Putting it together: a sign-off walkthrough
Let's trace how all this lands in practice. The flow is mechanical once you internalize the two enemies: chase setup at the slow corners, chase hold at the fast corner, and let derating and CPPR keep the pessimism honest. The same machinery you learned for a single ideal check now runs once per scenario, and the most negative slack across the whole MCMM grid is the number that gates tape-out.
- Build the scenario list. Enumerate the (corner, mode) pairs that can be the worst case; prune the rest with a documented rationale. Load the matching .lib for each corner and the SDC constraints for each mode.
- Run setup at the slow corners. Use the late-derate on data paths and the early-derate on capture clocks. A failing setup check is a setup violation — fix it by upsizing cells, restructuring logic, or slightly relaxing the clock period.
- Run hold at the fast corner. Now the derate signs flip. A failing hold check is a hold violation — usually fixed by inserting buffers/delay cells on the short path so data cannot race the clock.
- Apply CPPR. Let the tool refund the pessimism double-counted on the common clock segment, so you do not over-fix paths that were never really failing.
- Take the worst slack across ALL scenarios. Tape-out is gated by the most negative slack anywhere in the grid — for both setup and hold. One green scenario means nothing if another is red.
Notice what changed and what did not. The *equations* of a timing check — arrival time, required time, slack — are exactly the ones from earlier rungs. What multiplied is the *number of times you run them*: once per scenario, each with its own delays, its own derates, its own active constraints. Mastering that bookkeeping is the difference between a chip that works on the bench and one that works in every customer's hand. The next rung builds directly on this: with PVT, MCMM and OCV in hand, you are ready for the full variation-aware sign-off and the judgment calls — which corners to trust, which pessimism to remove — that separate a closing design from a stalled one.