From RTL to Gates: Logic Synthesis

What synthesis actually does

Your RTL is a *description of behavior*: "the output is high when these inputs agree," "this value updates on the clock edge." It says nothing about which transistors do the work. Logic synthesis is the translation step that turns that behavioral description into a concrete netlist of real gates — a parts list plus the wiring between them. Think of RTL as a recipe written in plain language ("fold until combined") and the netlist as the exact sequence of hand motions a specific cook will use.

Crucially, the tool does this against a target library — a fixed catalog of gates someone has already designed and characterized for your process node. Synthesis is not inventing transistors; it is shopping from a catalog and arranging what it buys. Two things go in (your RTL and a cell library), one thing comes out (a gate-level netlist), and a third thing — your constraints — tells the tool what "good" means.

The standard-cell library

A standard-cell library is the catalog synthesis shops from. Each cell is a small, pre-designed logic gate — an inverter, a NAND, an AND-OR, a D flip-flop — that the foundry has laid out down to the transistor and characterized: for every cell, they measured how long it takes to switch, how much it costs in area, and how much power it burns. The cells all share a fixed height so they snap together in neat rows, like LEGO bricks of the same stud-height.

Here is the part that surprises newcomers: the same logical function exists in many variants. A library might hold a NAND2 in a dozen "drive strengths" — a tiny one that's slow but sips power and area, up to a beefy one that switches fast but eats both. Same truth table, different muscle. Synthesis picks not just *which* gate but *which size of that gate*, and that single choice is where most of the area/timing/power trade-off lives.

RTL → optimized gate netlist

Let's make it concrete. Here is a scrap of RTL: a single register that loads `d` on every clock edge, plus one combinational output. It describes behavior — *what happens* — and never names a single gate.

module tiny (input clk, input a, input b, input d,
             output q, output y);
  reg q_r;
  always @(posedge clk) q_r <= d;   // a register: remembers d each clock edge
  assign q = q_r;
  assign y = ~(a & b);              // combinational: a NAND of a and b
endmodule

Before synthesis: behavioral RTL. One clocked register and one combinational output, with no gates named.

Now synthesis maps that onto real library cells. The clocked `reg` becomes a DFF standard cell; the `~(a & b)` collapses to a single NAND2 — exactly the gate the library already provides, so no inverter is wasted. The result is structural: every line now names a *physical part* and connects its pins.

module tiny (clk, a, b, d, q, y);
  input  clk, a, b, d;
  output q, y;
  DFF    u_q  (.CLK(clk), .D(d), .Q(q));   // chosen standard cell
  NAND2  u_y  (.A(a),     .B(b), .Y(y));   // chosen standard cell
endmodule

After synthesis: a gate-level netlist. Behavior is gone; only named cells and wired pins remain.

Constraints steer the result

Without guidance, the tool has no idea whether you want this design *small* or *fast* — and it can't have both for free. Constraints are how you tell it. The single most important one is the clock period: you declare "this clock runs at, say, 1 ns," and that one number becomes the budget every signal must fit inside between two register edges.

# A constraint (timing format, not Verilog): the clock budget
create_clock -name clk -period 1.0 [get_ports clk]   # 1.0 ns -> 1 GHz

A clock constraint. This single line sets the time budget synthesis must respect on every path.

Tighten that period and the tool *responds*: it reaches for bigger, faster cells, duplicates logic to shorten the longest path, and restructures math to save a level of gates — spending area and power to buy speed. Loosen the period and it does the reverse, swapping in small, low-power cells because it now has time to spare. The constraint doesn't just check the result; it actively shapes which cells get chosen. The amount of breathing room left on a path after the tool is done is its slack — positive means it fits, negative means it failed, and the tightest path is the critical path.

Area / timing / power trade-offs

Every choice in synthesis pulls on three strings at once: area (silicon, which is money), timing (how fast the clock can run), and power (heat and battery). Picture a triangle — pull hard toward one corner and you slide away from the others. Want more speed? You buy bigger cells: area goes up, power goes up. Want to save power? You accept slower cells and a looser clock. There is no setting that wins all three; the tool's whole job is to find the best point inside the triangle that still meets your constraints.

This is why constraints matter so much: they tell the tool *where in the triangle you want to live*. A phone chip leans toward the power corner; a high-end CPU leans toward timing and pays in area and watts. The synthesizer optimizes hardest on whatever you've made tightest — give it an aggressive clock and it will happily spend area and power to hit it, whether or not you meant for it to.

Reading a netlist

Open a synthesized netlist and it looks alien at first — pages of cell instances with cryptic names. But it's just the catalog made literal. Read it in three passes and it opens right up.

Find the registers first. Every instance whose cell name starts with DFF (or FF, SDFF, and so on) is a piece of state — a flip-flop. Counting them tells you how much your design *remembers*, and they're the start and end points of every timing path.
Read the cell name as function plus drive strength. NAND2_X4 means "two-input NAND, drive strength 4." The letters are the logic; the trailing number is muscle. High numbers clustered on one path are the tool shouting "this path was tight — I spent big cells here."
Follow the wires between pins, not the order of the lines. A netlist is hardware, so its lines all exist at once; the `.A(...) .Y(...)` connections, not top-to-bottom order, tell you what feeds what. Trace a net from a register output, through the gates, to the next register input — that chain is one timing path.