Going Vertical: 3D & Packaging

From one die to a stack

Every guide in this track so far has been about making the *transistors* better — smaller, faster, packed tighter, fed power from the back. But there is a second, older problem that shrinking transistors does nothing to fix: the further apart two things sit on a chip, the longer and slower the wire between them. You met this as interconnect scaling on the rungs below — as transistors shrank, the wires connecting them did *not* get proportionally faster, and on a large die a signal can spend more time crawling along metal than it spends doing anything useful. A big flat chip is, in a real sense, mostly *commute*.

A second problem is money. The bigger you print a single die, the more likely it is that one of the inevitable manufacturing defects lands somewhere on it — and a single fatal defect throws away the *whole* die. Yield falls off a cliff as area grows. So a giant monolithic chip is doubly punished: its far-flung wires are slow, and its size makes it expensive to manufacture without scrapping a painful fraction of every wafer.

The way out is to stop thinking of a chip as one slab of silicon. Cut the big design into smaller pieces — chiplets — and you get two gifts at once. Each small piece yields far better than one large die, and you can test each one *before* assembly so you only ever stack the good ones (the known-good die idea). Then the question becomes: how do you wire those pieces back together tightly enough that they behave like one chip? The answer is no longer drawn on the die. It is built in the package — and that is why packaging stopped being an afterthought and became a frontier of its own.

2.5D: the silicon interposer

The gentlest way to put chiplets back together is to lay them side by side — not stacked, but neighbours — on a shared slab that carries the wires between them. That slab is an interposer, and when it is itself made of silicon it can hold wiring almost as fine as the wiring inside a chip. The chiplets sit on top; the interposer underneath routes thousands of connections between them and then funnels everything down to the package below. Because the dies are laid out flat in *two* dimensions but joined through a third layer beneath them, the industry calls this 2.5D — more than flat, not quite a true stack.

Why bother, if the chiplets are still side by side? Because the interposer's wires are dramatically shorter and denser than anything you could route across a normal circuit board. Two chiplets a few millimetres apart on a silicon interposer can talk to each other with the kind of wide, fast, low-power connection that used to require them to be on the *same* die. This is the trick that lets high-bandwidth memory sit right beside a GPU and feed it through thousands of parallel wires — a direct answer to the memory wall, where the processor starves not for lack of compute but for lack of bandwidth to its memory.

3D: through-silicon vias

If side-by-side is good, on-top-of is better — provided you can get signals through a die from one face to the other. A chip is built on a wafer of solid silicon hundreds of microns thick; its wiring all lives on one face. To stack a second die directly above and connect them, you need a vertical wire that drills straight down *through* the body of the silicon. That wire is a through-silicon via, or TSV: a tiny pillar of metal, etched and filled clean through a thinned die, turning what used to be a dead slab of substrate into a vertical highway.

This is real 3D integration. Now the distance between two stacked dies is no longer millimetres across an interposer — it is microns, straight up. That collapse in distance is the whole point: shorter wires mean less delay and far less energy spent shoving signals around, which is the most direct possible attack on the interconnect problem. High-bandwidth memory itself is a 3D stack — several DRAM dies piled up and laced together with TSVs — which is then placed on the interposer beside the processor. The two ideas compose: a 3D-stacked memory cube sitting in a 2.5D arrangement next to the logic.

  2.5D — chiplets side by side on an interposer
  ┌──────┐        ┌──────┐        ┌──────┐
  │ logic│        │ HBM  │        │  I/O │   <- chiplets
  └──┬┬──┘        └──┬┬──┘        └──┬┬──┘
  ===||==============||==============||====  <- micro-bumps
  ┌──────────────────────────────────────┐
  │   silicon interposer (fine wiring)     │
  └────────────────────┬┬──────────────────┘
  ====================  ||  ================  <- C4 bumps
  ┌──────────────────────────────────────┐
  │             package substrate          │
  └────────────────────────────────────────┘

  3D — dies stacked, joined through their bodies
  ┌────────────────────────┐  die 2
  │  ▓ ▓ ▓  metal layers    │
  │  │ │ │  TSVs through Si  │
  ══╪═╪═╪═══════════════════   <- bond interface
  ┌────────────────────────┐  die 1
  │  ▓ ▓ ▓  metal layers    │
  └──────────┬┬─────────────┘
  ┌────────────────────────┐
  │      package substrate   │
  └──────────────────────────┘

2.5D spreads chiplets across an interposer and connects them with short, dense wiring; 3D stacks dies directly and threads signals up through the silicon with TSVs. They are routinely combined — a 3D memory stack placed in a 2.5D layout beside the logic.

Hybrid bonding: copper to copper

How do you actually join two stacked dies? The traditional answer is *bumps*: little balls of solder dotted across the face of a die that melt and fuse to matching pads on the die above. Bumps work, but they are coarse. Each one needs room, so you can only fit so many per square millimetre — and the gap they create between dies adds distance and resistance right where you were trying to eliminate it. Bumps were fine when a stack needed thousands of connections. They are hopeless when it needs millions.

Hybrid bonding throws the solder away entirely. Instead, the two die faces are polished mirror-flat, with bare copper pads embedded in insulator, and pressed directly together. The copper pads on one face fuse atom-to-atom with the copper pads on the other — copper to copper, no bump in between. With no solder ball to leave room for, the connections can be packed orders of magnitude tighter: pad pitches shrink from tens of microns down toward a single micron. The two dies stop behaving like separate chips wired together and start behaving like one continuous piece of silicon that happens to have a seam.

Thermal: the price of stacking

Stacking is not free, and the bill comes due as heat. A flat chip has one big face pressed against a heat sink, and every transistor has a more or less straight path to that cool surface. Stack a second die on top and the bottom die is now buried — its heat has to climb up through the die above it, or down through a forest of TSVs and bonds, before it can escape. The hottest, busiest logic is exactly what you most want to stack tightly, and it is exactly what suffers most when its heat has nowhere easy to go.

You have met this tension before, in a different costume. On the lower rungs, the end of Dennard scaling meant power density stopped falling as transistors shrank, which is why so much of a modern chip has to stay dark at any one moment — the dark-silicon problem. Stacking pours fuel on that fire: you are now packing power-hungry silicon into a *volume* instead of spreading it over an *area*, so the watts-per-cubic-millimetre climbs even as the watts-per-transistor falls. The reason you typically see memory stacked on memory, but logic kept to a single hot layer, is precisely this: memory runs cool, logic runs hot, and a stack can only shed so much.

The packaging renaissance

Step back and notice what has happened. For most of the history of Moore's Law, the package was a humble box — its job was to protect the die and connect it to the board, nothing clever. Performance came from the die alone. That era is closing. With monolithic dies hitting the limits of interconnect delay, yield, and the memory wall all at once, the package has been promoted: it is now where a large part of a system's performance is actually *won*.

For the chiplet idea to flourish, the pieces have to be able to talk to each other regardless of who made them — which means the die-to-die link needs a shared standard, the way USB standardised plugging things into a computer. UCIe is exactly that: an open standard for how one chiplet speaks to another across an interposer or a bond, so a company can mix its own logic with someone else's memory or I/O chiplet and trust that the seams will hold. Standardising the *interface between* dies is what turns one company's clever trick into a whole industry's building blocks.

So the frontier of scaling has, quietly, become three-dimensional and modular. Where the early rungs of this track pushed *down* into the transistor — FinFET, gate-all-around, the future CFET, all squeezing more out of a single plane of silicon — packaging pushes *up* and *out*, composing many optimised pieces into one system. Neither replaces the other. The chips ahead will be both: the best transistors we can print, cut into known-good chiplets, and bonded into stacks that the package, not the die, holds together. The capstone guide that closes this track ties these threads — device, design, and package — into one picture of where computing goes next.