Closing Timing: ECOs, Signoff Corners and Tape-Out

The last mile is the hardest mile

Climbers have a grim saying: the summit is only halfway, because the descent kills you. Chip design has its own version. By the time place-and-route hands you a layout, the design *works* — the logic is right, the floorplan is sane, 99% of paths already meet timing. And yet the last 1% — a few thousand stubborn paths out of tens of millions — can swallow weeks. This final stretch is timing closure: the disciplined, iterative grind of driving every violation to zero across every operating condition, then proving it with a golden tool so convincingly that the design is allowed to leave the building.

Everything earlier in this ladder pointed here. Static timing analysis gave you the launch-and-capture model; slack turned every path into a single number that is positive (passing) or negative (failing); setup and hold told you the two deadlines a flip-flop imposes; clock skew, derating and on-chip variation told you why margins are never as fat as they look. Closure is where you *spend* all of that knowledge. The currency is the timing ECO — a small, surgical change — and the receipt is signoff.

The ECO toolbox: four moves and when to use them

An engineering change order (ECO) is borrowed from old paper-engineering: a formal, documented, *minimal* change to a finished design. A timing ECO is the IC version — you do not re-synthesize and re-route the whole block (that would scramble the 99% that already passes and could introduce brand-new violations). Instead you reach in with tweezers and change a handful of cells. There are four classic moves, and a good engineer knows the cost of each.

Resize (upsize) a cell. A path is too slow because a gate is driving a heavy load weakly. Swap it for a higher-drive version of the same standard cell (an X2 buffer becomes an X4). It drives faster, fixing the setup violation — but it burns more power, draws more area, and loads its own input harder, which can push the violation one stage upstream.
Swap Vt (threshold voltage). The same logic cell exists in slow-but-low-leakage (HVT), balanced (SVT), and fast-but-leaky (LVT) flavours via multi-Vt design. Swapping a setup-critical cell from HVT to LVT can claw back tens of picoseconds without changing footprint at all — the ECO is nearly free in area and placement, but it raises leakage power. For hold fixes you do the reverse: swap to a *slower* HVT cell to add delay.
Buffer insertion / restructuring. A long wire has too much resistance and capacitance; insert a buffer (or two) to break the RC into shorter segments and restore the slew. For hold violations the same trick adds delay to a too-fast data path. Inserting cells needs free placement sites nearby — if the area is congested, the buffer lands far away and may not help, or may even hurt.
Reroute / layer promotion. If the wire itself is the culprit, ask the router to re-route that net on a wider, thicker upper metal layer with lower resistance, or detour it away from aggressor nets to cut crosstalk. The most invasive move — it can disturb neighbours — so it is usually a last resort for a genuinely wire-dominated path.

Path R12 (failing setup, slack = -85 ps)
  reg_a/Q -> u_and3 -> u_buf1 -> long_net_n47 -> u_xor2 -> reg_b/D

Fix option            delta slack   power    area    placement risk
--------------------  -----------   ------   -----   --------------
upsize u_xor2 X2->X4    +40 ps       +18 uW   +1 row    low
swap u_and3 SVT->LVT    +55 ps       +leak    0         none (in-place)
buffer net_n47 (x1)     +70 ps       + leak   +1 cell   needs free site
reroute net_n47 M5->M7  +95 ps       ~0       0         disturbs n44,n51

Chosen ECO: swap u_and3 to LVT (+55) AND upsize u_xor2 (+40)
  => predicted slack = -85 + 55 + 40 = +10 ps  (PASS, margin thin)

A real closure decision: each candidate fix trades [[slack|slack]] against power, area and disruption. The engineer often combines two cheap fixes instead of one expensive one.

Why one timing report is never enough: MCMM

Here is the trap that snares beginners. You make an ECO, the report turns green, you celebrate — and the next morning the chip fails timing. Why? Because a chip does not live at one temperature, one voltage, or one fabrication outcome. It must work in a phone left in a hot car, a server in a chilled rack, a part from the slow corner of the wafer, another from the fast corner. Each combination is a corner, and the design must pass them *all*.

This is multi-corner multi-mode analysis, or MCMM. A *corner* captures the physical conditions: process (slow/typical/fast silicon), voltage (0.72 V to 0.88 V), temperature (−40 °C to 125 °C), plus parasitic RC extraction sets. A *mode* captures the functional state: full-speed mission mode, low-power mode, scan-test mode, sleep mode. Multiply them together and a serious SoC routinely signs off across hundreds of scenarios — each its own full STA run.

          slow-V-hot      typ-typ      fast-V-cold
          (setup-worst)              (hold-worst)
         +------------+ +----------+ +-------------+
func     | check SETUP| |  margin  | | check HOLD  |
mode     | slow wins  | |          | | fast wins   |
         +------------+ +----------+ +-------------+
scan     | check SETUP| |   ...    | | check HOLD  |
mode     +------------+ +----------+ +-------------+

Golden rule of thumb (not absolute):
  SETUP is worst in the SLOW corner (low V, high T, slow process)
  HOLD  is worst in the FAST corner (high V, low T, fast process)
An ECO that fixes setup in the slow corner may CREATE a hold
violation in the fast corner -> you must re-check BOTH after every fix.

MCMM is a grid: every (corner × mode) cell must be green at once. The ECO loop is closed only when no cell anywhere is red.

This is also why timing derating matters at closure. On-chip variation means two identical cells microns apart run at slightly different speeds, so signoff multiplies launch-path delays by (1 + derate) and capture-path delays by (1 − derate) to model the worst alignment. A path that passes nominally can fail once derating is applied — so you close timing on the *derated, corner-specific* numbers, never the optimistic nominal ones.

Signoff: the golden word before the chip leaves

Throughout implementation, the place-and-route tool runs its *own* timing engine — fast, approximate, good enough to steer optimization. But you would never bet a $30-million mask set on it. [[timing-signoff|Timing signoff]] is the final, authoritative verdict, produced by a separate golden signoff STA tool (industry standard: Synopsys PrimeTime) that everyone — designers, the foundry, management — agrees to trust. Its delay models, parasitic handling and corner library are the reference of record.

Why a separate tool at all? Independence. The implementation engine and the signoff engine are built by different teams (sometimes different companies) with different algorithms, so if both agree a path is clean, that agreement is real evidence — not one tool grading its own homework. The catch is correlation: the two engines must agree closely, or you get the maddening situation where P&R reports a path closed but signoff reports it failing. Closing that gap — tightening derates, matching extraction, aligning libraries — is half the battle in the final weeks.

Crucially, signoff is not only setup and hold slack. A clean signoff is a *bundle* of green reports: setup met in all corners, hold met in all corners, every clock-domain crossing handled, crosstalk-induced delay folded into the timing, no max-transition or max-capacitance design-rule violations, and clock skew and uncertainty all accounted for. Only when that whole bundle is clean across the full MCMM set does the timing checklist get its tick.

Tape-out: the point of no return

When every signoff gate is green — timing, plus its siblings physical verification (DRC/LVS), power, electromigration/IR-drop, and DFT coverage — the design is ready for tape-out. The name is a fossil from the 1970s, when the final design was literally a reel of magnetic tape carried to the fab. Today it means shipping the GDSII (or OASIS) layout database to the foundry to make photomasks. Those masks cost millions and take weeks to cut, and once silicon is in the oven there is no `Ctrl-Z`. A timing bug that escapes signoff becomes a respin — months of schedule and a fortune in mask costs.

That is the whole reason signoff is so heavy. The gate exists because the cost of being wrong is catastrophic and irreversible. So tape-out is gated by a formal checklist review where every owner stands up and says, in effect, "my domain is clean and I will sign my name to it." Timing closure is one signature on that page — but historically one of the most contested, because it touches frequency, which touches the product's headline spec.

TAPE-OUT READINESS CHECKLIST  (timing owner section)
[x] Setup slack >= 0 in ALL setup corners & modes (MCMM)
[x] Hold  slack >= 0 in ALL hold corners & modes (MCMM)
[x] Signoff STA = golden tool (PrimeTime), libraries frozen v3.2
[x] P&R <-> signoff correlation within 30 ps on worst paths
[x] Crosstalk / SI delay enabled; derates applied per foundry deck
[x] All clock-domain crossings constrained & verified
[x] No max_tran / max_cap / max_fanout DRV violations
[x] All ECOs re-run clean; no functional ECO outstanding
[x] False/multicycle paths reviewed & justified (no over-relaxing)
--> TIMING SIGNED OFF.  Release GDSII to mask shop.

The timing owner's slice of the tape-out gate. Every box must be checked honestly — a hidden over-relaxed exception is how silent bugs reach silicon.

The engineer's checklist: closure as an art

If the rest of this ladder taught you the rules of timing, this rung is about *judgement* — because closure is a constrained optimization with no single right answer. You are trading four things that all matter and all push against each other: timing, power, area, and schedule. Speed up a path and you spend power and area; spend too long perfecting it and you blow the schedule, which in a competitive market can cost more than a slightly slower chip ever would.

Triage by worst slack, fix the worst-negative-slack (WNS) path first, then re-time before touching the rest — shared cells mean one fix often clears many paths. Track total negative slack (TNS) to see the whole field shrinking, not just the single worst path.
Reach for the cheapest fix that works. Vt swaps are nearly free in area and placement; upsizing costs power and area; buffer insertion and reroute are most disruptive. Climb that ladder only as far as you must.
Re-check every corner after every fix. A setup fix in the slow corner can break hold in the fast corner. Closure is not done when one report is green — it is done when the entire MCMM grid is green at once.
Trust the golden signoff tool, not the implementation estimate. Drive correlation tight early, freeze the signoff setup, and re-run signoff after the final ECO — never tape out on P&R's optimistic numbers.
Justify every exception in writing. A false path or multicycle constraint must be provably correct, not a convenience to silence a red line. The cost of being wrong is a respin.
Know when good enough is done. A path closed to +200 ps is not better than one closed to +5 ps if you spent two weeks and a hundred microwatts to gain a margin nobody needs. Closure is finished when every corner passes with sane margin — then you stop, and you tape out.

That is the whole arc of this ladder, paid off. STA gave you the lens, slack gave you the score, setup/hold/skew/derate gave you the physics, and closure is where a human applies judgement to spend the fewest picojoules and the fewest days driving every number green. When the last corner clears and the tape-out checklist is signed, a few square millimeters of silicon are about to be committed to mask — running correctly across every condition it will ever meet, because you proved it would.