Signing Off Across Corners: PVT, OCV and Multi-Corner Multi-Mode

One netlist, a million different chips

Imagine a bakery that follows one recipe perfectly, yet every loaf comes out a little different — the oven runs hot near the back, the flour's moisture drifts with the weather, the kitchen warms up through the afternoon. The recipe is fixed; the conditions are not. A modern fab is exactly this kind of bakery, only the loaves are billions of transistors and the tolerances are measured in atoms. Lithography never prints two gates identically; dopant atoms land by the laws of chance; the gate oxide is a few atomic layers thick and varies wafer to wafer. The chip you simulated at one nominal operating point is a fiction. What actually ships is a probability cloud.

Three knobs dominate how fast a finished transistor switches, and engineers bundle them into one acronym: PVT — Process, Voltage, Temperature. *Process* is how the silicon came out of the fab: were the transistors etched a touch fast or a touch slow? *Voltage* is the supply rail the chip happens to see — nominally 0.8 V on a leading node, but it droops under load and sags as a battery drains. *Temperature* is the junction temperature, anywhere from −40 °C in an automotive cold-start to 125 °C in a thermally throttled SoC. Each axis shifts delay, and they do not move independently — the same chip can be slow on one mode and fast on another.

Process corners: bracketing the silicon lottery

You cannot characterize every possible chip — there are too many. Instead the foundry hands you a handful of pre-built process corners: extreme but physically plausible combinations of how the NMOS and PMOS transistors turned out. The classic five are named by the speed of each transistor type. TT (typical-typical) is the nominal recipe. FF (fast-fast) is the lucky wafer where both N and P came out fast — high drive, low delay. SS (slow-slow) is the unlucky wafer — weak drive, long delay. SF and FS are the skewed corners where one type ran fast and the other slow, which matters enormously for anything that depends on the N/P balance, like ratioed logic or duty cycle.

PMOS
  fast |  FS  · · · · · · · ·  FF
       |   \                  /
       |    \      TT        /        <- TT = nominal recipe
       |     \   (center)   /            FF = both fast  (fastest cells)
  slow |  SS  · · · · · · · ·  SF        SS = both slow  (slowest cells)
       +------------------------         SF/FS = N and P pull opposite ways
         slow      NMOS      fast

        Each '·' is a corner the library is characterized at.
        STA must close timing at EVERY relevant one.

The N-vs-P process plane. Real wafers scatter inside this box; the corners bracket the extremes the foundry guarantees you will not exceed.

Here is the single most useful intuition in this whole guide. Setup timing — does the data arrive before the clock edge captures it? — is hardest when logic is *slow*, so its worst case usually lives at the SS, high-temperature, low-voltage corner, where gates crawl. Hold timing — does the data stay stable just *after* the edge, rather than racing through and corrupting the same register? — is hardest when logic is *fast*, so its worst case usually lives at the FF, low-voltage corner, where a short path sprints through before the clock has settled. One chip, two opposite enemies, and you must defeat both.

The MCMM matrix: every corner times every mode

Corners answer *which silicon*. But a real chip also runs in many *modes* — and timing constraints change with the mode. A phone SoC might have a high-performance mode at 0.9 V and 3 GHz, a low-power mode at 0.6 V and 800 MHz, a sleep/retention mode, a scan-shift test mode that toggles every flop in a long chain, and a functional mode where only real logic clocks fly. Each mode has its own active clocks, its own clock periods, its own enabled paths and false-path exceptions. A path that is critical in test mode may not even exist in functional mode.

Cross the list of corners with the list of modes and you get the multi-corner multi-mode (MCMM) matrix — a grid of *scenarios*, each a specific (corner, mode) pair the chip must satisfy. Sign-off means closing both setup and hold in every scenario that is physically reachable. A 4-mode, 8-corner design can easily produce a dozen-plus active scenarios, and large SoCs run dozens. This is why timing closure is a campaign, not a single run.

                  CORNER (process, voltage, temperature)
             SS/0.72V/-40C   SS/0.72V/125C   TT/0.80V/25C   FF/0.88V/-40C
           +--------------+---------------+--------------+---------------+
 FUNC HI    |   setup      |   setup ***   |    (ref)     |   hold ***    |
  (3GHz)    |              |               |              |               |
           +--------------+---------------+--------------+---------------+
 FUNC LO    |   setup      |   setup       |              |   hold        |
 (0.8GHz)   |              |               |              |               |
           +--------------+---------------+--------------+---------------+
 SCAN SHIFT |              |               |              |   hold ***    |
 (test)     |              |               |              |               |
           +--------------+---------------+--------------+---------------+

   Each non-empty cell = one SIGN-OFF SCENARIO that must pass.
   ***  = where the worst setup / worst hold violations typically appear.
   Setup hunts the SLOW corners; hold hunts the FAST corner.

A trimmed MCMM matrix. Tools build each scenario once and propagate timing through it; you must clean every starred cell.

On-chip variation: pessimism inside a single die

Corners capture variation *between* chips — your die versus mine. But two transistors millimetres apart on the *same* die also differ: random dopant fluctuation, local lithography wobble, IR-drop gradients across the power grid, even a thermal hot-spot under a busy block. This local mismatch is on-chip variation (OCV). The cruel twist is that for a single timing check, OCV can push the *launch* path and the *capture* path in opposite directions — making the data path slow while the clock-to-the-capture-flop runs fast, the worst of both worlds for setup.

STA models OCV with timing derating: it multiplies cell and net delays by a derate factor so the launch and capture paths are pulled apart on purpose. For a setup check the tool *slows* the launch (data) path with a late-derate above 1.0 and *speeds* the capture (clock) path with an early-derate below 1.0; for a hold check it does the reverse. The numbers come from the library's variation data and the design rung — a late-derate of, say, 1.05 and an early-derate of 0.95 is a typical starting point, larger on aggressive nodes.

SETUP check with OCV derating (the pessimistic split):

   launch path  --[ data: DERATE LATE x1.05  -> made SLOWER ]-->  D
                                                                  |
   common clock -+                                                FF
   from PLL      |                                                 ^
                 +--[ capture clk: DERATE EARLY x0.95 -> FASTER ]-+

   data arrives later  +  clock arrives sooner  =  worst-case squeeze

BUT: the shaded part of the clock tree is SHARED by launch & capture.
Derating it both ways at once is DOUBLE-COUNTING. The fix:

   --> Common Path Pessimism Removal (CPPR / CRPR):
       credit back the over-pessimism on the common segment.

OCV pulls launch and capture apart for worst case, then CPPR refunds the impossible pessimism on the shared clock path.

Putting it together: a sign-off walkthrough

Let's trace how all this lands in practice. The flow is mechanical once you internalize the two enemies: chase setup at the slow corners, chase hold at the fast corner, and let derating and CPPR keep the pessimism honest. The same machinery you learned for a single ideal check now runs once per scenario, and the most negative slack across the whole MCMM grid is the number that gates tape-out.

Build the scenario list. Enumerate the (corner, mode) pairs that can be the worst case; prune the rest with a documented rationale. Load the matching .lib for each corner and the SDC constraints for each mode.
Run setup at the slow corners. Use the late-derate on data paths and the early-derate on capture clocks. A failing setup check is a setup violation — fix it by upsizing cells, restructuring logic, or slightly relaxing the clock period.
Run hold at the fast corner. Now the derate signs flip. A failing hold check is a hold violation — usually fixed by inserting buffers/delay cells on the short path so data cannot race the clock.
Apply CPPR. Let the tool refund the pessimism double-counted on the common clock segment, so you do not over-fix paths that were never really failing.
Take the worst slack across ALL scenarios. Tape-out is gated by the most negative slack anywhere in the grid — for both setup and hold. One green scenario means nothing if another is red.

Notice what changed and what did not. The *equations* of a timing check — arrival time, required time, slack — are exactly the ones from earlier rungs. What multiplied is the *number of times you run them*: once per scenario, each with its own delays, its own derates, its own active constraints. Mastering that bookkeeping is the difference between a chip that works on the bench and one that works in every customer's hand. The next rung builds directly on this: with PVT, MCMM and OCV in hand, you are ready for the full variation-aware sign-off and the judgment calls — which corners to trust, which pessimism to remove — that separate a closing design from a stalled one.