Controllability & Observability: The Heart of Testability

Reaching a node you cannot touch

Imagine a vast factory with thousands of machines packed shoulder to shoulder, and you are the inspector. There is just one problem: the only way into the building is a handful of doors on the outside wall, and the only windows look out from that same wall. A machine in the dead centre of the floor might be broken, but you can neither feed it the parts that would make it misbehave, nor see the defective product it spits out — every later machine swallows that output and turns it into something else. That factory is a chip. The doors and windows are its pins. And the machine in the middle is a logic gate you must somehow test through everything stacked around it.

This is the central tension of manufacturing test, and it follows straight from rung 1. To test a node for a defect, you must do two things: set it to a known value, then check that it actually took that value. Both of those actions can only happen at the pins, because the pins are the only electrical contact the test machine has with the die. A leading-edge SoC might have ten million logic gates but only a few hundred functional pins — so on average, hundreds of thousands of nodes have to share each doorway into the chip. The question 'how testable is this design?' really decomposes into two sharper questions asked of every single node.

Controllability: forcing a value from the edge

Controllability is the *input* half of the problem. A node deep in combinational logic is not a wire you can clip onto — its value is whatever the gates feeding it compute, which in turn depends on the gates feeding *them*, and so on back to the pins. To set a buried node to 1, you must find a combination of pin values that, after rippling through every intervening gate, leaves exactly a 1 sitting there. Sometimes that is easy. Sometimes the logic conspires against you, and no pin pattern can produce the value you need.

Engineers quantify this with a rough controllability score — informally, the number of pin assignments (or the difficulty) needed to justify each value. The classic SCOAP measure walks the circuit from the inputs forward, assigning every primary input a baseline cost of 1 (you control it directly) and then accumulating cost gate by gate. The deeper and narrower the cone of logic, the higher the score, and the harder the node is to drive. A node whose 1-controllability is enormous while its 0 is cheap is a warning sign: you can knock it down easily but almost never lift it up, so half its faults will never be excited.

  Primary inputs            buried node n         output
  ----------------          -------------         ------
  a ---|\                                                
       | & )---x---|\                                     
  b ---|/         | & )---n---|\                          
  c -------|\     |          | & )---+--- ... deeper ---  y
           | & )--/   d -----|/      |  (8 more gates)    
  d -------|/                        |                    
                                                          
  To set n = 1 you need  x = 1  AND  d = 1
     -> x = 1  needs  a = 1 AND b = 1   (the upper AND)
     -> d = 1  directly
  So n = 1  requires  a=1, b=1, d=1  ............  doable

  To set n = 0 you only need x = 0 OR d = 0 ......  many ways, easy
     -> n's 1-controllability is HARD, 0 is EASY  (asymmetric!)

A chain of AND gates makes producing a 1 expensive: every input in the cone must cooperate. Producing a 0 is cheap — any single 0 anywhere suffices. This asymmetry is exactly what ATPG tools flag as a low-controllability node.

Observability: seeing the answer come back out

Even if you triumphantly force a buried node to the value you wanted, you have done nothing useful unless you can *see the result*. Observability is the *output* half: a node's effect must travel forward through every downstream gate and arrive, distinguishable, at a pin you can measure. The enemy here is logic that masks the node — that swallows its value and produces the same output whether the node is right or wrong. A faulty node you cannot observe is exactly as invisible as a node you cannot control.

The trick to making a node observable is sensitisation: you set up the *other* inputs of every gate along the path so that the path becomes a clear pipe — flipping the node under test flips the final output. For an AND gate, that means holding its other inputs at 1 (so it passes the node through unchanged); for an OR gate, holding the others at 0. Notice the cruel twist: those side inputs are themselves buried nodes that must be *controlled*. Observing one node forces you to control a whole supporting cast of others, and the deeper the node, the longer that cast.

  Sensitising a path so node n is OBSERVABLE at output y:

            n ----|\
                  | & )---p---|\
   side1 = 1 ----|/          | & )--- y
                  side2 = 1 -|/

  AND #1: hold side1 = 1  ->  p = n   (passes n through)
  AND #2: hold side2 = 1  ->  y = p = n

  Now:  n = 0  gives  y = 0
        n = 1  gives  y = 1     <-- y is a perfect mirror of n
  The path is SENSITISED: any fault flipping n is visible at y.

  But if side1 = 0 instead:  p = 0 always  ->  y is stuck,
  n is completely MASKED and its fault is unobservable.

To observe n, every gate on the path to y must be set to pass it through (AND side inputs = 1, OR side inputs = 0). A single masking side input anywhere breaks the chain and hides the fault — which is why observation is really a controllability problem in disguise.

A worked example: the fault that hides

Let's make this concrete with the smallest circuit that already shows the trap. We have a tiny block of combinational logic: two AND gates feeding a third, and a deeply buried internal node n that we suspect might be stuck. We want to test whether n can be driven to 1 and whether that 1 reaches the output. Watch how the requirements for controlling n and observing n start to fight each other.

  a --|\
      | & )--- g1 --|\
  b --|/            | & )--- n --|\
  c --|\           |            | & )--- y
      | & )--- g2 -|             |
  d --|/            e ----------|/

  GOAL: test node n for a 'stuck-at-0' fault.
  Step 1 - CONTROL n to 1 (to excite a stuck-at-0):
      n = g1 AND g2,  so need  g1 = 1 AND g2 = 1
      g1 = 1  ->  a = 1, b = 1
      g2 = 1  ->  c = 1, d = 1
      => inputs a=b=c=d=1 force n = 1.            (good)

  Step 2 - OBSERVE n at y through the last AND:
      y = n AND e,  so to pass n through we need  e = 1.
      With e = 1:  fault-free  y = n = 1
                   if n stuck-at-0  y = 0   <-- DIFFERENT, detected!

  Required test pattern:  a=b=c=d=e = 1, expect y = 1.
  If silicon returns y = 0, node n is stuck-at-0.

Here the stars align: one pattern (all inputs = 1) both drives n to 1 and sensitises the path to y, so the stuck-at-0 fault is detectable. Eleven gates deep this rarely happens by luck — it is exactly what an ATPG tool searches for, pattern by pattern.

Now change one wire and the picture darkens. Suppose node e is not a free primary input but is itself computed as e = NOT(a). Then the very pattern that controls n (a = 1) forces e = 0 — which masks the last AND gate, pinning y to 0 regardless of n. Controlling n and observing n now demand *opposite* values of a. No single pattern satisfies both, so the stuck-at-0 on n becomes untestable by functional means: it can sit there, defective, and ship in working-looking silicon. This is not a contrived horror story; it is what naturally happens as logic gets reconvergent and deep.

Why sequential logic makes it dramatically worse

Everything so far assumed pure combinational logic, where a node's value depends only on the *current* pins. Real chips are full of sequential logic — flip-flops and registers that remember the past. A flip-flop's output is not a function of today's inputs; it is whatever was clocked into it on some earlier cycle. To control the data input of a buried register, you cannot just set the pins — you must drive the pins, pulse the clock, drive them again, pulse again, walking the machine through a *sequence* of states until it finally arrives where you want it. Controllability stops being 'find one pattern' and becomes 'find a path through time'.

Observability suffers the mirror-image curse. The effect of flipping a buried node may have to be clocked into a register, held, propagated to the next register on the following cycle, and only several cycles later finally appear at an output. To detect one fault you must control a *multi-cycle input sequence* and then observe a *multi-cycle output sequence* — and the test generator has to reason about the chip's behaviour over time, not in a single snapshot. The state space it must search explodes: a design with k flip-flops has up to 2^k states, and many of them may be unreachable or reachable only through long, fragile sequences.

  Combinational test of one fault:
      apply 1 input pattern  ->  read 1 output   (one cycle)

  Sequential test of one fault buried behind 3 registers:
      cycle 0:  set pins, pulse clk   (load reg A)
      cycle 1:  set pins, pulse clk   (reg A -> reg B, set up reg C)
      cycle 2:  set pins, pulse clk   (excite the fault into reg C)
      cycle 3:  set pins, pulse clk   (propagate toward output)
      cycle 4:  read output           (fault finally visible?)

  One fault -> a 5-cycle JUSTIFY + PROPAGATE sequence.
  Millions of faults x long sequences x 2^k reachable states
      = ATPG run-time and fault coverage fall off a cliff.

A fault that took one combinational pattern now takes a carefully ordered multi-cycle sequence — and the tool must find one for every fault. As register depth grows, sequential ATPG run-time and achievable coverage collapse. This is the wall that motivates scan.

And feedback turns the screw one more time. A finite-state machine feeds its registers' outputs back into the logic that decides their next value. Now a node you want to control depends on a register whose value depends on that node — a loop in time. To reach a target state you may need a precise sequence of dozens of cycles, and some states may be reachable only from a power-on reset you must replay every time. Worse, a node hidden inside a tight feedback loop can be nearly uncontrollable and unobservable at once: you cannot set it without disturbing the loop, and you cannot watch it without the loop swallowing the evidence. Functional test coverage of such logic routinely collapses below what any product can ship with.