DRAM: One Transistor, One Capacitor, and the Refresh Treadmill

Six transistors versus one-and-a-capacitor

In rung 3 you met the SRAM bitcell: six transistors wired into two cross-coupled inverters that hold a bit by sheer stubbornness. Power them and they latch — one inverter drives the other, the other drives back, and the loop locks 0 or 1 in place forever. It is fast, it never forgets while powered, and it never needs maintenance. There is just one problem: six transistors is a lot of silicon for one bit. When you want to store eight gigabytes, that overhead becomes the whole game.

DRAM makes a radical bargain. Throw away five of the six transistors and the whole idea of a self-sustaining loop. Keep one transistor as a gate and add one [[capacitor|capacitor]] as the actual memory. The bit is no longer a stubborn argument between inverters; it is simply *whether that capacitor is charged or empty.* A full bucket of charge is a 1, an empty bucket is a 0. The transistor is just a valve that connects the bucket to the outside world when you want to read or write. That is the entire 1T1C cell — one transistor, one capacitor — and it is perhaps the most-manufactured circuit in human history.

SRAM 6T CELL  vs  DRAM 1T1C CELL

  SRAM bitcell (6 transistors)        DRAM cell (1T + 1C)
  ----------------------------        --------------------
        WL                                    WL (wordline)
         |                                     |
     +---+---+                                 |
     |       |                          [access transistor]
   |inv| <-> |inv|   (cross-coupled)           |
     |       |                            node ===  C  (storage
     +---+---+                                 |       capacitor)
      |     |                                 GND
     BL    BL_bar                          |
                                          BL (bitline)
   holds bit by feedback;            bit = charge on C;
   static, never decays,             leaks away -> must be
   ~6T of area per bit               REFRESHED; ~1T of area

  Area per bit: SRAM ~ 6 devices   |   DRAM ~ 1 device + 1 cap
  -> DRAM packs roughly 4-8x more bits in the same silicon.

The structural swap: SRAM's six-transistor latch versus DRAM's lone transistor guarding a single storage capacitor.

The leaky bucket: why DRAM must run a refresh treadmill

Here is the catch that gives DRAM its name — the D stands for *dynamic.* A storage capacitor is not a perfect bucket. The charge you deposit slowly seeps away: a little through the access transistor's tiny leakage even when it is switched off, a little through the junction into the silicon, a little to the surroundings. The capacitance is minuscule — only a few tens of femtofarads, about 25 fF in a modern cell — and the stored charge is a vanishingly small puddle of electrons. Within a few milliseconds a freshly written 1 has leaked down toward the voltage of a 0, and the bit is on the verge of being forgotten.

An SRAM cell never has this problem — its feedback loop continuously redrives the bit, so it holds forever (as long as the power is on). DRAM has no loop and no one redriving it, so it forgets unless someone keeps reminding it. That reminder is [[ic-refresh|refresh]]: the memory controller marches through every row of the array on a fixed schedule — typically every row at least once every 64 milliseconds (32 ms in hot conditions) — reads each cell, and writes the value straight back, topping up the buckets that have started to drain. Refresh is the treadmill DRAM can never step off.

ONE CELL FORGETTING, AND BEING REMINDED

  V on storage node
   Vdd |1*                          1*           1*
       | \                          /\           /
       |  \   leaks down            / \         /
       |   \  (~ms)               /   \       /
  Vdd/2|----\-------- sense ------/-----\-----/----  decision
       |     \       threshold   /       \   /        line
       |      \                 /         \ /
     0 |_______*_______________*___________*___________ t
               ^               ^           ^
            refresh!        refresh!     refresh!
          (read + write-back tops the bucket back to Vdd)

  Every cell must be refreshed before its charge drifts past
  the sense threshold (~Vdd/2).  Miss the window -> bit lost.
  Refresh budget: ALL rows, at least once per 64 ms.

A stored 1 decays toward the sense threshold within milliseconds; each refresh reads the cell and writes the full value back, restarting the slide.

Destructive readout: reading a DRAM cell erases it

Now for the strangest fact about DRAM, and the place where rung 3's bitline and sense amplifier return in a new role. To read a cell, the controller raises its wordline, turning on the access transistor and connecting the tiny storage capacitor to the long bitline that runs the height of the array. But that bitline is huge compared to the cell — its parasitic capacitance is *tens of times larger* than the storage cap. When the valve opens, the cell's little puddle of charge spills into that vast bitline and shares itself out across both. The act of reading drains the bucket. Whatever was stored is now spread thin and almost gone — this is destructive readout.

How can you read anything from a signal that destroys itself? The trick — the same trick from rung 3 — is to read differentially and to read before the dust settles. Just before opening the cell, the bitline is precharged to exactly half the supply, Vdd/2, sitting perfectly on the fence between 0 and 1. Then the wordline opens. If the cell held a 1, its charge nudges the bitline a *hair* above Vdd/2; if it held a 0, the cell pulls the bitline a *hair* below. The shift is tiny — perhaps a hundred millivolts, sometimes far less. A neighbouring 'dummy' bitline is left untouched at Vdd/2 as a reference. The whole question of what the cell held now lives in which of these two lines is slightly higher.

The sense amplifier's resurrection: amplify, then write back

A hundred-millivolt difference is not a digital answer; it is a rumour. The job of turning that rumour into a hard 0 or 1 belongs to the [[ic-sense-amplifier|sense amplifier]] — and here it does double duty that has no equivalent in SRAM. The sense amp is itself a pair of cross-coupled inverters straddling the two bitlines, a latch much like the SRAM cell you already know. Once the tiny difference appears, the controller fires the sense amp. Positive feedback takes over: the line that was a hair higher gets yanked all the way up to Vdd, the line that was a hair lower gets slammed down to 0. The rumour snaps into a full-rail, unmistakable digital value.

And here is the elegant part that closes the destructive-readout problem. While the sense amp is driving the bitline to a full, clean rail, the access transistor is *still open* — its wordline is still high. So the very same full-strength signal the sense amp drives onto the bitline flows straight back through the open valve and recharges the storage capacitor to a crisp Vdd (for a 1) or a clean 0. The read didn't just retrieve the bit; it restored it. Every single read is automatically a refresh of that one cell. The sense amplifier is part oracle, part defibrillator.

Precharge the bitline and its reference partner to Vdd/2 and let them settle — both lines balanced on the fence.
Raise the [[ic-wordline|wordline]]: the access transistor opens, the storage capacitor shares its charge with the bitline, nudging it a few tens of millivolts above or below Vdd/2 (this drains the cell — destructive readout).
Fire the [[ic-sense-amplifier|sense amplifier]]: its cross-coupled latch amplifies the tiny difference by positive feedback, driving one line fully to Vdd and the other fully to 0.
Write-back happens for free: because the wordline is still open, the full restored value flows back into the storage capacitor, healing the cell that the read just drained.
Lower the wordline and precharge again, ready for the next access. The bit is back in its bucket, full strength, as if nothing happened.

Folding the capacitor: trenches, stacks, and HBM

There is a quiet contradiction baked into the storage capacitor. Scaling wants every feature smaller — but a *smaller* capacitor stores *less* charge, and we just saw that too little charge makes the sense signal vanish into noise. To keep enough capacitance (roughly the ~25 fF a sense amp needs) while shrinking the footprint, DRAM makers do something almost sculptural: they fold the capacitor into the third dimension. Capacitance grows with plate area, so instead of a flat plate they build a tall, narrow structure that hides a large surface in a tiny footprint.

Two great families emerged. A trench capacitor digs a deep, narrow hole straight down into the silicon and lines its walls — imagine a well shaft whose inner surface is the capacitor plate. A stacked capacitor does the opposite, growing a tall cup or cylinder *upward* above the access transistor. Modern stacked caps have astonishing aspect ratios — pillars perhaps fifty times taller than they are wide — so the storage cell looks less like a flat circuit and more like a forest of microscopic chimneys standing over the transistors. This vertical trick is what let DRAM keep doubling density even as the surface footprint of each bit shrank toward atomic scales.

HIDING A BIG CAPACITOR IN A TINY FOOTPRINT

  TRENCH cap (dig DOWN)         STACKED cap (build UP)
  ---------------------         ----------------------
   transistor                      _____  tall cup /
     |  surface                   |     | cylinder
  ===|========                    |  C  |  (high
     |  | | |                     |     |   aspect
     |  | | |  deep narrow        |_____|   ratio)
     |  | | |  well; walls          | |
     |  |_|_|  = capacitor          | |  access transistor
     |          plates           ===|=|=========
   silicon                        surface

  Both pack ~25 fF of capacitance into a footprint only
  a few tens of nm wide -- by using VERTICAL surface area.
  Aspect ratios of stacked caps can exceed ~50:1.

Trench (down into silicon) and stacked (up above the transistor) capacitors both win area by going vertical.

Folding the *cell* upward solves density inside one chip — but it does nothing for the second great DRAM problem: getting all those bits *off* the chip fast enough. A wide flat row buffer is useless if it must dribble out through a thin pipe of package pins. The frontier answer is to fold the *system* upward too. [[high-bandwidth-memory|High-bandwidth memory]] (HBM) stacks a tower of DRAM dies — four, eight, twelve high — and threads thousands of vertical wires called through-silicon vias (TSVs) straight up through the stack. The stack sits millimetres from the processor on a silicon interposer, replacing a narrow 64-bit channel with a colossal interface a thousand or more bits wide.

DRAM's place on the seesaw

Step back and the whole memory hierarchy from rung 1 falls into place as a set of trades around one seesaw: density versus speed. Registers and SRAM caches sit at the fast, expensive top — six transistors a bit, no refresh, single-nanosecond access, but only kilobytes-to-megabytes because every bit is costly. DRAM sits in the broad, affordable middle — one transistor and a capacitor a bit, so gigabytes are cheap, at the price of refresh, destructive readout, and a random access latency of tens of nanoseconds instead of single-digit. Below it, slower still, sits flash and disk. Each layer trades a little speed for a lot of capacity.

Why is DRAM's *random* access so much slower than SRAM's, beyond just the leakage? Three reasons all trace back to the 1T1C cell. First, the signal is feeble — that millivolt charge-sharing swing takes real time for the sense amplifier to resolve, where SRAM's full-strength latch flips almost instantly. Second, access is row-grained — you must open a whole page (and possibly close a previous one, writing it back) before you can pick a column. Third, refresh occasionally collides with your request, stalling it. None of these afflict SRAM. DRAM accepts all three because the prize — packing gigabytes onto a thumbnail — is worth it.