Slack and the Critical Path: Reading a Timing Report

From rules to numbers: arrival and required time

Imagine a courier who has to deliver a package across town before the recipient leaves for the airport. There are two clocks that matter: the time the package actually shows up at the door, and the deadline by which it had to arrive. If the courier beats the deadline, everyone is happy and there is some margin to spare. If the courier is late, the recipient is gone and the delivery fails. Digital timing is exactly this story, told millions of times per chip. The package is a logic value rippling down a timing path; the door is the data input of a flip-flop; and the deadline is set by the clock edge plus the flop's setup requirement.

We give these two clocks formal names. The [[ic-arrival-time|arrival time]] is when the data signal actually reaches a given point — the sum of every delay it picked up along the way, from the moment it left the launching flip-flop. The [[ic-required-time|required time]] is the latest moment that signal is allowed to arrive and still be captured safely. For a normal setup check the required time is essentially `(next clock edge) − (setup time) − (clock uncertainty)`: the data must be settled and quiet a little before the capturing edge, not exactly on it.

Slack: the one number that decides everything

Once you have arrival and required time, the whole pass/fail question collapses into a single subtraction. [[slack|Slack]] equals required time minus arrival time. That is the entire definition, and it is worth burning into memory because it is the number you will quote in every design review for the rest of your career.

slack = required_time - arrival_time

  slack > 0   ->  PASS, with margin to spare   (data arrived early)
  slack = 0   ->  exactly on the edge          (zero margin)
  slack < 0   ->  VIOLATION                     (data arrived too late)

Example (setup check at a capturing flop):
  required_time = 2.000 ns   (edge 2.0 ns, setup already folded in)
  arrival_time  = 1.870 ns
  ----------------------------------------------
  slack         = 2.000 - 1.870 = +0.130 ns     ->  PASS (+130 ps margin)

Flip one delay so the data lands late:
  arrival_time  = 2.090 ns
  slack         = 2.000 - 2.090 = -0.090 ns     ->  FAIL (-90 ps, setup violation)

Slack in three lines: positive is margin, zero is the knife-edge, negative is a violation you must close.

Positive slack means the signal got there early — the path has margin you could spend on a higher clock frequency, a lower voltage, or smaller, cheaper cells. Zero slack means the path is exactly on the knife-edge: any added delay tips it into failure. Negative slack is a violation — specifically a [[ic-setup-violation|setup violation]] when it's a setup check — and the magnitude tells you exactly how much faster the path needs to become. A slack of −90 ps means you must shave at least 90 picoseconds off that path (or relax the clock by that much) before the chip can run at the target frequency.

The critical path is just the worst slack

A real block has not one path but millions of them, each with its own slack. The [[critical-path|critical path]] is simply the path with the *smallest* (most negative, or least positive) slack in the whole design. It is the bottleneck — the single chain of logic that is closest to failing, and therefore the one that decides how fast the entire chip is allowed to run. Strengthen everything else and nothing changes; the critical path alone sets the speed limit, exactly like the slowest car on a one-lane road dictates the pace of every car behind it.

This is why timing closure is a game of *prioritisation*, not perfectionism. The tool sorts every endpoint by slack and hands you the worst offenders first. You fix the critical path, the second-worst path becomes the new critical path, and you repeat. The relationship to frequency is direct: if your worst setup slack is +130 ps at a 2.0 ns clock period (500 MHz), you have 130 ps of headroom and could in principle push the period down to about 1.87 ns before slack hits zero. If the worst slack is −90 ps, you are *over* budget and the chip will not meet 500 MHz until that path is fixed or the clock is slowed.

Reading a real STA report, line by line

All of this is computed by [[static-timing-analysis|static timing analysis]] (STA), a tool that checks every path without ever simulating a single test vector — it just adds up delays. Its primary output is the timing report, and once you can read one fluently you can debug almost any timing problem. Below is a trimmed-but-realistic setup report for one path. It has two halves: the data arrival path (how the data actually propagates) and the data required path (the deadline, built from the clock). Read it top to bottom.

Startpoint: u_ctrl/state_reg[2]  (rising clk, launched by CLK)
Endpoint:   u_alu/result_reg[7]  (rising clk, captured by CLK)
Path Group: CLK
Path Type:  max  (setup)

  Point                                   Incr   Path     Type
  -------------------------------------------------------------
  clock CLK (rise edge)                   0.000   0.000
  clock source latency                    0.000   0.000
  clock network delay (propagated)        0.182   0.182          <- clock path
  u_ctrl/state_reg[2]/CK (DFFX1)          0.000   0.182   r
  u_ctrl/state_reg[2]/Q  (DFFX1)          0.094   0.276   r      <- clk-to-Q
  net: state[2]  (fanout=3)               0.041   0.317   r      <- net delay
  u_and0/Y  (AND2X2)                      0.063   0.380   r      <- cell delay
  net: n12      (fanout=1)                0.018   0.398   r
  u_add1/CO (FADDX1)                      0.121   0.519   r
  net: carry    (fanout=2)                0.029   0.548   r
  u_mux2/Y  (MUX2X1)                      0.072   0.620   r
  net: result_pre[7] (fanout=1)           0.022   0.642   r
  u_alu/result_reg[7]/D (DFFX1)           0.000   0.642   r
  data arrival time                               0.642          <= ARRIVAL
  -------------------------------------------------------------
  clock CLK (rise edge)                   2.000   2.000
  clock source latency                    0.000   2.000
  clock network delay (propagated)        0.205   2.205          <- capture clk
  clock uncertainty                      -0.050   2.155
  u_alu/result_reg[7]/CK (DFFX1)          0.000   2.155   r
  library setup time (DFFX1)             -0.061   2.094          <- setup
  data required time                              2.094          <= REQUIRED
  -------------------------------------------------------------
  data required time                              2.094
  data arrival time                              -0.642
  -------------------------------------------------------------
  slack (MET)                                     1.452          <= SLACK

A setup path report. The top half is the data arrival path; the bottom half is the clock-derived required path; the final line is required − arrival = slack.

Read the Incr column as "how much delay this single element adds" and the Path column as "running total so far." The data path starts at the launch clock edge (time 0), waits for the clock to fan through the network (`clock network delay`, 0.182 ns) to reach the launching flop, then pays the clk-to-Q of `state_reg[2]` (0.094 ns) — the time for the flop's output to actually change after its clock edge. From there the signal alternates between cell delays (gates: the AND, the full-adder's carry, the MUX) and net delays (the wires connecting them, which on modern nodes are a large and growing share of the total). The running total when it reaches `result_reg[7]/D` is the arrival time: 0.642 ns.

The bottom half builds the deadline. It starts at the *next* clock edge (2.000 ns — this is a one-cycle path with a 2 ns period), waits for the clock to reach the *capturing* flop (0.205 ns), then subtracts two safety margins: clock uncertainty (0.050 ns, covering jitter and skew estimation) and the flop's library setup time (0.061 ns, the quiet window the data must hold before the edge). The result is the required time: 2.094 ns. The final subtraction — 2.094 − 0.642 — gives slack = +1.452 ns, and the tag `(MET)` confirms the check passes with comfortable margin.

From report to fix: closing a failing path

Suppose that same report came back with `slack (VIOLATED) -0.115` instead. Negative slack is not a mystery — it is a budget you have overspent by 115 ps, and the path report itself is the itemised bill telling you where the money went. Closing it is a methodical loop, always starting from the worst path:

Find the worst path. Sort all endpoints by slack and open the report for the single most-negative one. This is your current critical path; ignore everything else for now.
Read the bill. Scan the data-path Incr column for the biggest contributors. Is it one slow gate? A long high-fanout net? An overly deep logic chain (too many gate levels between the two flops)?
Apply the matching fix. Slow gate → upsize the cell or pick a faster Vt flavour. Long net → improve placement or insert buffers. Deep logic → restructure the logic or retime (move a register) so fewer levels sit between flops.
Re-run STA and check the new slack. A fix that turns −0.115 into +0.010 closes this path — but it may have lengthened a neighbour. The new worst path becomes your next target.
Repeat until the worst slack is ≥ 0 across all path groups (and all corners and modes). That is timing closure.

One more lever sits outside the data path entirely: the [[timing-constraint|timing constraint]] itself. Every number in the required half of the report flows from constraints you wrote — the clock period, its uncertainty, input/output delays. If a path fails by a hair and the constraint was needlessly pessimistic (say you double-counted margin that's already in the uncertainty), tightening the *constraint* rather than the *silicon* is the cheapest fix of all. But never relax a real constraint just to make a report green: that only hides a violation that silicon will happily expose at speed.