Making It Fast Enough: Timing & Constraints

The clock sets the budget

Every synchronous chip marches to a clock — a square wave that ticks, say, a billion times a second. Think of each active edge as the starting gun and the deadline at once: on one edge, your flip-flops launch fresh data onto the wires; on the very next edge, the downstream flip-flops grab whatever has arrived. Everything the logic needs to do — every gate it passes through — has to finish inside one clock period. That window is your whole budget.

So the clock period sets the ceiling. A 1 GHz clock gives you a 1 nanosecond budget — 1000 picoseconds — per stage. Want to go faster? Shrink the period. But you can't shrink it below the time your slowest pile of logic actually needs to settle. The clock can only be as fast as your slowest path lets it be. That tension — push the clock up, but not past what the logic can finish — is the entire game of timing.

The critical path

Picture data flowing from one register to the next through a thicket of combinational logic — adders, muxes, stacks of gates. Each gate takes a few picoseconds to switch, and each wire takes time to charge. Add up the delay along one route from launch flop to capture flop and you get that path's total travel time. Your design can have millions of such paths.

The critical path is simply the slowest one — the path with the worst (most negative) slack, the longest delay relative to the time available between a launch flop and a capture flop. It matters because it alone sets your speed limit: the clock period must be at least as long as that path needs, or the data won't arrive before the next tick captures. The chip is only as fast as its slowest path — just like a convoy moves at the speed of its slowest truck.

// Long combinational chain between two registers = a candidate critical path
always @(posedge clk) begin
  sum_q <= a + b + c + d;  // the adder tree here must settle within ONE period
end

Pile too much logic between two clocked registers and that stretch becomes your critical path. Splitting work across more pipeline stages shortens each hop.

Setup/hold at scale -> STA

A flip-flop is fussy about *when* data arrives. The data must be stable for a sliver of time before the capturing clock edge — that's the setup requirement — and stay stable for a sliver after the edge — that's the hold requirement. Miss setup and the data arrived too late to be reliably captured; miss hold and the data changed too soon after the edge, before the flop could safely latch the old value. Either way the captured bit is unreliable — it can go metastable. (See setup & hold time for the full picture.)

Now multiply that by millions of flops and millions of paths. You cannot possibly catch every violation by running simulations — you'd need to feed in every input combination at every moment, which is hopelessly impractical. This is where static timing analysis, or STA, changes the game.

STA checks timing without any stimulus at all. It doesn't run your design; it builds a graph of every flop-to-flop path, adds up the gate and wire delays along each, and compares that total against the setup and hold requirements at the destination. No test vectors, no waveforms — just arithmetic over the delay graph. Because it's exhaustive math rather than sampled simulation, STA checks every path, every time, and never misses one because you forgot to test it. (Setup is checked against the slow corner, hold against the fast corner — more on corners later.)

Slack: pass or fail

Here's how STA turns into a single verdict per path. For a setup check it computes two numbers: the arrival time (when the signal actually shows up at the capture flop) and the required time (the latest it's *allowed* to show up and still meet setup). The difference is the path's slack:

slack = required − arrival Positive slack means the signal arrived with time to spare — pass, and the number tells you how much margin you have. Zero slack means it landed exactly on the deadline — just barely met. Negative slack means it arrived too late — fail. A −50 ps slack says this path overshoots its budget by 50 picoseconds; the chip will not run reliably at this clock speed. The single worst (smallest) slack in the whole design is called the WNS — worst negative slack — and it's the number that tells you, at a glance, whether you've passed. (Hold checks use the same slack idea, but there a *fast* arrival is the danger, so the arithmetic flips: hold slack = arrival − required.)

Now we can sketch the speed limit. Strip the setup budget down to its pieces and the fastest clock you can run is roughly:

f_max = 1 / (t_clk-to-q + t_logic + t_setup - t_skew)

Max frequency = 1 over the critical-path budget. t_clk-to-q is the launch flop's clock-to-output delay, t_logic is the combinational delay, t_setup is the capture flop's setup requirement, and t_skew is clock-arrival mismatch — it subtracts when the capture clock arrives later, because that hands the path extra time. Shrink the delay terms and the ceiling rises.

Clock skew & the real budget

So far we've pretended the clock edge hits every flop at the exact same instant. It doesn't. The clock travels across the chip on real wires through real buffers, so it reaches some flops a hair earlier and others a hair later. That arrival-time difference between the launch flop's clock and the capture flop's clock is clock skew — the `t_skew` term in the formula above.

Skew cuts both ways. If the capture clock arrives late relative to the launch clock, it generously hands your data a little extra time to arrive — skew helps your setup check. But that same late capture edge steals from your hold margin, because now the new data has even longer to race toward a flop that hasn't latched yet. Skew that rescues setup can wreck hold — which is why hold violations are sometimes the nastier ones: slowing the clock down doesn't fix them, since hold is a same-edge race that doesn't depend on the clock period.

Closing timing

"Closing timing" is the practitioner's phrase for the grind of driving every path's slack to zero or positive — no negative slack anywhere, setup and hold both clean, across all the operating corners (hot, cold, fast-silicon, slow-silicon, and the voltage extremes). Until timing is closed, the design should not tape out; a chip that fails timing is a chip that risks computing wrong answers at speed.

Run STA and find the worst negative slack (WNS) — the path that fails by the most.
Attack that critical path: simplify the logic, swap in faster standard cells, resize gates, or add buffers to drive long wires.
If the logic genuinely can't fit in one period, split it across more pipeline stages so each hop does less work.
Re-run STA — fixing the worst path usually promotes a *new* path to critical. Repeat.
Loop until WNS ≥ 0 for setup *and* hold across every corner. Now timing is closed.

Zoom out and the loop is always the same: the clock hands you a budget, the critical path spends the most of it, STA audits every path against the deadline, and slack is the receipt. Make slack non-negative everywhere and the chip runs. That's timing closure in one breath.