Non-Volatile Memory: Flash, Floating Gates & eFuses

The third pillar: memory that survives the dark

By now you have met two of the three great pillars of the memory hierarchy, and both are forgetful. SRAM (rung 2) holds a bit in a pair of cross-coupled inverters that fight to keep each other awake — fast and tireless, but the instant the supply drops, the fight ends and the bit is gone. DRAM (rung 4) stores a bit as charge on a tiny capacitor so leaky it must be refreshed thousands of times a second; cut the power and it forgets in milliseconds. Both are volatile: their memory is an ongoing activity, not a stored object. Stop feeding them energy and the memory stops existing.

The third pillar is different in kind. [[ic-nonvolatile-memory|Non-volatile memory (NVM)]] stores a bit as a *physical state* that persists with no power at all — a state you have to actively undo to change. The dream is a memory you can write once and trust to be there years later, through power cycles, reboots, and a dead battery. This is the layer that holds your boot code before any power supply has stabilised, the firmware that teaches a chip how to be itself, and the photos that outlive a thousand discharge cycles. Where SRAM and DRAM ask *how fast*, NVM asks a harder question: *how do you make a memory that refuses to forget?*

Three pillars of on-chip memory

  Type     Cell            Speed        Density   Power off?
  ------   -------------   ----------   -------   -------------------
  SRAM     6T flip-flop    fastest      low       FORGETS (microsec)
  DRAM     1T1C capacitor  fast         high      FORGETS (millisec)
  NVM      floating gate   slow write   highest   REMEMBERS for years
                                                   (~10 yr retention)

  Rule of thumb: each step right trades speed for the ability
  to remember longer with less (or no) power.

The three memory pillars side by side. NVM gives up write speed and gains the one thing the others can't offer: memory that outlives the power.

The floating gate: an electron trap inside a transistor

The trick that made non-volatile memory ubiquitous is one of the most elegant in all of semiconductor design, and it hides inside an ordinary-looking MOSFET. Recall the MOSFET from rung 2: a gate sits over a channel, and the gate voltage decides whether the transistor conducts. The level at which it switches on is the threshold voltage, Vt. Now do one strange thing — insert a *second*, completely isolated piece of conductor between the control gate and the channel, surrounded on every side by insulating oxide. It connects to nothing. It is electrically marooned. This is the floating gate.

Here is why an isolated, unconnected scrap of metal is the whole point. If you can somehow inject electrons onto the floating gate, they have nowhere to go — the oxide walls trap them like a fly in amber. And trapped negative charge does something measurable: it partly screens the control gate's field from the channel. To turn the transistor on now, the control gate must work harder to overcome that hidden negative charge — in other words, the threshold voltage shifts up. An empty floating gate gives a *low* Vt; a charged one gives a *high* Vt. The two Vt states are stable, distinct, and survive power-off because nothing is pulling the charge back out. You read the bit by checking which Vt the cell has.

Floating-gate cell: a MOSFET with a hidden charge trap

         control gate  (the wordline you drive)
        =================
        ~~~ oxide ~~~~~~~
        [ FLOATING GATE ]  <- electrons trapped here (or not)
        ~~~ tunnel oxide ~
     n+ [    channel    ] n+
        -----------------
        p-substrate

   Floating gate EMPTY   ->  low  Vt  ->  reads as '1' (erased)
   Floating gate CHARGED ->  high Vt  ->  reads as '0' (programmed)

   Read = apply a gate voltage BETWEEN the two Vt's:
      cell with low Vt  -> turns on  -> current flows -> '1'
      cell with high Vt -> stays off -> no current    -> '0'

The floating gate sits between the control gate and the channel, sealed in oxide. Trapped electrons raise Vt; reading just asks 'did the cell turn on at this gate voltage?'

Writing with quantum tunnelling — and why it wears out

So how do electrons get onto a gate that is sealed inside an insulator with no wire to it? You cannot just route them in — that's the whole point of the trap. Instead flash uses physics that only shows up under extreme fields. Fowler–Nordheim tunnelling: put a large enough voltage across the thin tunnel oxide (a few nanometres) and the field becomes so steep that electrons quantum-mechanically tunnel *straight through* the insulator that is supposed to stop them. The same oxide that traps electrons forever at normal voltages becomes briefly passable under ~15–20 V of stress. NOR flash often programs with a related mechanism, hot-carrier injection, where electrons accelerated along the channel gain enough energy to jump the barrier onto the gate.

Erasing reverses it: flip the field and tunnel the electrons back *out* of the floating gate. Notice the asymmetry that names the technology. You can erase, but only in big blocks at once — pulling charge out is done by tunnelling whole regions, so flash is erased in chunks (a 'block' or 'sector'), not bit by bit. That is exactly why it's called flash: an early engineer thought erasing a whole block in one electrical pulse resembled a camera flash. You can re-program individual pages, but to rewrite a bit that's already programmed, you must first erase the entire block it lives in. This block-erase granularity shapes everything about how flash is used.

And here is the catch that no other memory has: every program/erase cycle damages the oxide. Each time electrons are slammed through the tunnel oxide, a few get stuck inside the insulator itself and a few defects form in the lattice. The trapped charge slowly shifts the cell's Vt window and the defects make the oxide leakier, until one day the cell can no longer hold a reliable, distinguishable 1 and 0. This is wear-out, and it gives every flash a finite endurance — measured in program/erase (P/E) cycles. SLC flash might survive ~100,000 cycles; dense consumer TLC/QLC flash often only ~1,000–3,000. There is no equivalent limit for SRAM or DRAM; flash literally ages with use.

NAND vs NOR: two ways to wire the same cell

The floating-gate cell is the same in both, but how you *wire millions of them together* splits flash into two species with opposite personalities. The names come from how the cells connect to the bitline. In NOR flash, every cell hangs in parallel directly between a bitline and ground, like the inputs of a NOR gate. That means any single cell can be addressed and read on its own, so a CPU can fetch instructions straight out of NOR — true random access, byte by byte, with very fast reads. The price is area: all those parallel contacts are bulky, so NOR is low-density and expensive per bit. NOR is where boot code and small firmware live, the memory you execute in place (XIP).

In NAND flash, cells are wired in *series* — long strings of dozens of cells stacked drain-to-source like the chain in a NAND gate, sharing far fewer contacts. That packs cells unbelievably tightly, which is why NAND is the densest, cheapest memory humanity makes and the reason a fingernail-sized chip holds a terabyte. The catch is the series chain: to read one cell you must turn *all the others in its string fully on* so current can pass through them, then sense the target. You cannot read a lone byte — NAND works in pages (read/program) and blocks (erase). It is slow to access randomly and needs heavy ECC, but for bulk storage — SSDs, phones, USB drives — nothing beats it on cost per bit.

Same cell, opposite wiring

  NOR (parallel)            NAND (series string)
  ------------------        --------------------------
  BL --+--[cell]-- GND      BL --[c]-[c]-[c]-...-[c]-- GND
       +--[cell]-- GND          select one, drive the
       +--[cell]-- GND          rest fully ON to read

  NOR                       NAND
   + fast random read        + ~10x denser / cheaper bit
   + execute-in-place        + best for bulk storage
   - low density, costly      - page/block access only
   use: boot/firmware/code    - needs heavy ECC
                              use: SSDs, phones, USB

NOR puts cells in parallel for fast byte-level reads; NAND strings them in series for maximum density. The cell is identical — the topology decides the use case.

Two more facts complete the picture. To chase ever-higher density, modern NAND stopped storing one bit per cell. By dividing the Vt window into several distinguishable levels, MLC stores 2 bits/cell, TLC 3, and QLC 4 — squeezing more data into each floating gate at the cost of tighter margins, slower writes and lower endurance. And to keep growing without shrinking the cell further, the industry went vertical: 3D NAND stacks the strings upward, today well past 200 layers, so capacity grows by building tall rather than fitting more onto a flat process node. NVM is the one place where 'going 3D' became the mainstream answer years before the rest of the chip.

eFuse: burning a permanent fact into silicon

Not every non-volatile bit needs to be rewritten a million times. Sometimes a chip needs to remember exactly one thing, exactly once, for the rest of its life: its serial number, a security key, which spare row to swap in, or the precise trim that calibrates an analog block. For these, full flash is overkill — flash needs special high-voltage charge pumps and extra mask steps that a plain logic process doesn't have. The lightweight answer is the [[ic-efuse|eFuse]]: a one-time-programmable (OTP) element that you build from the materials already on every chip.

The mechanism is brutally simple. An eFuse is a thin strip of conductor — polysilicon or metal — sitting unblown as a low-resistance link, reading as a logic 0 (or 1, by convention). To program it, you force a large current through it for a controlled pulse. The strip heats and electromigration physically rearranges the atoms, opening the link into a high-resistance gap — it is, quite literally, blown like a household fuse. There is no charge to leak away and no oxide to wear: the *geometry of the metal itself* now encodes the bit, permanently. You can read a fuse any number of times, but you can write it exactly once. There is no erase. It is the most non-volatile memory of all because the change is structural.

Serial numbers & chip ID — burn a unique number into every die at test, so each part is individually traceable and addressable.
Redundancy repair — after manufacturing test finds a bad memory row or column, fuses record which spare to swap in. This is the bridge to memory repair: fuses make the repair *permanent*.
Analog trim — laser-free calibration: blow fuses to nudge a reference voltage or oscillator frequency to its exact target after the chip is built and measured.
Feature & security config — lock device options, disable a debug port forever, or set keys that can be read but never rewritten.

Beyond flash: the frontier of emerging NVM

Flash has one stubborn weakness it can never fully shake: writing means shoving electrons through an oxide, which is slow, power-hungry, and wears the cell out. For decades researchers have hunted for an NVM that writes as fast and effortlessly as SRAM yet remembers like flash — the long-promised 'universal memory'. Several contenders have left the lab and reached real silicon, and the common thread is that they store the bit in a *new physical state*, not on a floating gate.

MRAM (magnetic, esp. STT-MRAM) — stores the bit in the magnetic orientation of a tiny layer. Fast, effectively unlimited endurance, and increasingly used as an embedded flash replacement on advanced nodes. The leading on-chip NVM contender.
ReRAM / memristor — the bit is the resistance of an oxide filament that forms or ruptures. Simple, dense, and a favourite for in-memory computing and AI accelerators.
PCM (phase-change) — switches a chalcogenide material between an amorphous (high-R) and crystalline (low-R) phase, the same physics as a rewritable DVD, now on chips.
FeRAM (ferroelectric) — stores the bit in the polarisation of a ferroelectric layer; very low write energy and high endurance, used where power and speed matter more than density.

None has dethroned flash for bulk storage — NAND's cost per bit is brutally hard to beat — but the action is at the embedded scale: small NVM blocks built right alongside logic on the same die. As classic flash gets harder to integrate below ~28 nm, MRAM and ReRAM are quietly stepping in as the embedded NVM of choice on advanced semiconductor processes, storing firmware and weights next to the cores that use them. The floating gate was the answer for forty years; the next forty may belong to memory that stores a bit as magnetism, resistance, or phase.