DVFS and UPF: Tuning Power at Runtime and Capturing the Intent

The V² lever, now under software control

Think of a runner on a treadmill. When the belt is slow, she can stroll and barely breaks a sweat; when it speeds up, she has to push harder and her heart rate climbs. A processor is the same — but it gets to choose its own treadmill speed, moment by moment. When you are reading email, the cores are nearly idle, so why run them flat-out? Dynamic voltage and frequency scaling, or DVFS, is the technique of lowering the supply voltage and clock frequency when the workload is light, and raising them again the instant a demanding task arrives.

Why bother dropping voltage at all — why not just lower the clock? Because of the single most important equation in this whole track. From rung 2, dynamic power is P = α·C·V²·f. Frequency appears once, linearly, but voltage appears squared. Halving f alone halves power but also halves the work done, so the energy per task is unchanged — you finish slower for the same joules. The magic of DVFS is that a lower frequency lets you also lower the voltage (slower logic needs less drive), and that V² term then rewards you twice over.

Workload light → drop one operating point:

  Before:  V = 1.00 V,  f = 1.8 GHz
  After:   V = 0.80 V,  f = 1.2 GHz

  Dynamic power scales as V^2 * f:
    P_after / P_before = (0.80/1.00)^2 * (1.2/1.8)
                       = 0.64 * 0.667
                       = 0.43

  ~57% less switching power for ~33% less clock speed.

Dropping one V/f operating point: the squared voltage term does most of the saving.

Operating points and the governor

A chip does not offer infinitely many V/f combinations. The design team characterises a small table of validated pairs called operating performance points (OPPs, or P-states in x86 land). Each one is a (voltage, frequency) tuple that has been silicon-proven to meet timing across the worst-case corner. A low-power smartphone SoC might expose six to twelve such points per cluster, from a deep-idle 0.6 V / 300 MHz crawl up to a 1.05 V / 3 GHz sprint.

Who decides which point to use? A piece of software called the governor, running in the operating system kernel (Linux's `cpufreq` subsystem is the classic example). The governor watches CPU utilisation over short windows and moves the chip up or down the OPP table — schedutil, for instance, reads the scheduler's load estimate and picks the lowest point that still finishes the work in time. A hardware power-management controller then sequences the actual change.

Going up (more performance): raise the voltage first and wait for the on-chip regulator and power delivery network to settle, *then* raise the clock. Clocking fast at a voltage that has not yet risen would violate setup timing.
Going down (more savings): lower the clock first, *then* lower the voltage. Reduce demand before you reduce supply, never the reverse.
During the transition: a fast on-die DC-DC or LDO ramps the rail in microseconds; many designs briefly gate or stretch the clock so no edge arrives while the rail is mid-flight.

The problem: power structure that RTL cannot say

Here is the uncomfortable truth that has been hiding behind every rung so far. Your Verilog describes *logic* — what each gate computes — but it says nothing about *power*. There is no Verilog keyword for "this block sits on a switchable rail," no syntax for "clamp this output to 0 when the domain is off," no way to write "retain these flops through power-down." RTL is power-blind by design, and that was always a feature: the same RTL should run on a 5 V board or a 0.6 V mobile node.

But all the machinery from rungs 3 through 5 — the power switch cells that gate a domain, the isolation cells that clamp its dangling outputs, the retention registers that hold state through sleep, the level shifters between islands — is *physical power structure*. If you try to express it by hand-editing netlists, three different tools (synthesis, place-and-route, and verification) will each make their own assumptions, and they will disagree. That disagreement is exactly how a chip taped out with an isolation cell missing on one of ten thousand crossings — and came back dead.

UPF and CPF: the format that captures intent

Two standards grew up to express power intent. CPF (Common Power Format) came from Cadence; UPF (Unified Power Format) was standardised by Accellera and then ratified as IEEE 1801, and it is the one the whole industry has converged on. Both are Tcl-based scripting languages — you do not draw the power architecture, you *program* a description of it. Because UPF is the dominant survivor, this is the one to know; CPF lingers mostly in legacy flows.

A UPF file names the actors of your power architecture in a fixed vocabulary: `create_power_domain` carves the design into power domains; `create_supply_net` and `create_supply_port` define the rails; `create_power_switch` describes the header or footer switch that gates a domain's virtual supply; and `set_isolation` / `set_retention` declare the *strategy* — the rule that tells the tools to insert isolation and retention cells everywhere a rule's conditions are met, without you naming a single instance.

# --- A tiny UPF snippet (IEEE 1801) ---------------------------

# 1. Carve out a switchable domain for the modem block
create_power_domain PD_MODEM -elements {u_modem}

# 2. Define the always-on and the switched (virtual) supplies
create_supply_net    VDD     -domain PD_MODEM
create_supply_net    VDD_SW  -domain PD_MODEM
create_supply_net    VSS     -domain PD_MODEM

# 3. A header switch: VDD -> VDD_SW, gated by sleep_n
create_power_switch  sw_modem \
    -domain        PD_MODEM \
    -input_supply_port  {in  VDD}    \
    -output_supply_port {out VDD_SW} \
    -control_port  {ctrl sleep_n}    \
    -on_state      {ON  in {ctrl}}

# 4. Strategy: clamp every output of PD_MODEM to 0 while it is off
set_isolation  iso_modem \
    -domain        PD_MODEM \
    -isolation_power_net VDD \
    -clamp_value   0 \
    -applies_to    outputs

# 5. Strategy: retain the modem's state registers through power-down
set_retention  ret_modem \
    -domain        PD_MODEM \
    -retention_power_net VDD

One domain, one power switch, one isolation rule, one retention rule — declared, not drawn. Every downstream tool reads the same five facts.

One golden file, every tool: the capstone

Here is why UPF is the capstone that ties the whole track together. The *same* file flows through the entire implementation chain. Synthesis reads it and inserts the right isolation, level-shifter, and retention cells into the netlist. Place-and-route reads it and builds the physical power grid, places the switch cells, and routes the always-on supply to the retention flops. And verification reads it to prove that nothing was lost in translation. One golden description, three consumers, zero room for the three of them to disagree.

That last consumer — UPF-driven verification — is the true frontier, because power bugs are invisible to ordinary logic simulation. A normal simulator assumes every gate is always powered, so it will happily simulate a powered-down block producing valid data. A low-power-aware simulator, fed the UPF, instead corrupts the outputs of an off domain to X and watches them propagate. If an X reaches a live flip-flop, your isolation strategy has a hole. Structural checkers add a second net: they statically prove that every domain crossing in the netlist actually has an isolation or level-shifter cell, exactly as the UPF demanded.

What a UPF-aware simulation catches that plain RTL sim never could:

  sleep_n  ____________            ____________________
               clean? \__________/   wake

  VDD_SW   ~~~~~~~~~~~~\__________/~~~~~~~~~~~~  (rail collapses)

  modem_out  D7  D8  D9  X  X  X  X   D10  D11    <- X while OFF
                          |________|
  iso_out    D7  D8  D9 [ 0  0  0  0]  D10  D11    <- clamped 0
                          (isolation cell holds the boundary)

  If iso_out tracked the X instead of 0, an X corrupts the
  downstream always-on logic -> a real silicon bug, caught
  in simulation because the UPF told the sim the domain was off.

Low-power-aware simulation injects X on a powered-down domain; the isolation strategy must clamp it before it poisons live logic.

Step back and see the whole staircase. Rung 1 named the two enemies, dynamic and leakage. Rung 2 gave you the V² lever. Rungs 3 to 5 built the parts — gating, isolation, retention, multiple rails. This rung pulled the lever at runtime with DVFS, and then captured every part in a single formal contract with UPF, verified end to end. That is the discipline of low-power design: physical insight at the bottom, a runtime control loop at the top, and a machine-checked description holding the two together.