High Bandwidth Memory, or HBM, is a stack of DRAM dies mounted right next to a processor on a silicon interposer and connected by thousands of fine wires. That wide connection gives HBM far more bandwidth than the DDR memory in a laptop, which is why nearly every AI accelerator — including NVIDIA's flagship GPUs — relies on it to feed model weights to the compute cores.

Why HBM is in short supply

HBM has become the gating resource of the entire AI buildout. Through 2026, demand has grown several times over relative to 2023, while the advanced-packaging capacity needed to assemble HBM stacks has not kept pace — major suppliers have reported their packaging lines booked to capacity, and industry analysts project conventional DRAM prices rising sharply as manufacturers prioritize high-margin HBM. A gigabyte of HBM also consumes substantially more wafer area than standard DRAM, so every HBM stack diverts capacity that would otherwise make ordinary memory. The widely reported conclusion is that the squeeze persists for years, not quarters.

Why HBM is fundamentally limited

Even setting supply aside, HBM has a ceiling that more stacks cannot lift. The memory still lives beside the processor, not inside it, so every weight a model needs must travel across the package to reach the math. An eight-stack HBM3e package delivers on the order of 8 TB/s — impressive, but at batch size one, where each token must stream the whole model once, that bandwidth becomes the hard limit on inference speed. Widening the pipe each generation helps at the margin but never removes the round trip, which is where most of the energy goes.

Bandwidth versus capacity positioning map A two-axis map plotting on-die capacity against weight bandwidth. SRAM accelerators sit top-left with high bandwidth but low capacity. HBM GPU sits lower-right with moderate capacity and lower bandwidth. Sophon occupies the top-right ideal zone alone, with both high bandwidth and high capacity at 330 GB and 4.2 PB per second. BANDWIDTH vs CAPACITY IDEAL ZONE ON-DIE CAPACITY → WEIGHT BANDWIDTH → SRAM accelerators ≤ tens of GB · shard across many chips HBM GPU 8 TB/s · across a package Sophon 330 GB · 4.2 PB/s
SRAM trades capacity for bandwidth and HBM trades bandwidth for capacity; only Sophon reaches the top-right where both run high.

What are the alternatives to HBM?

Two architectural directions try to escape the HBM trap. The first is large on-chip SRAM, used by wafer-scale and many inference-focused chips: SRAM is extremely fast, but holds at most tens of gigabytes — often only hundreds of megabytes per chip — so a large model has to be sharded across many devices. The second is to bring DRAM-class capacity onto the die itself through monolithic 3D integration. PhantaField's Sophon grows 330 GB of capacitor-less DRAM directly above the compute, reaching 4.2 PB/s of in-tile weight bandwidth — roughly 525× an eight-stack HBM3e package — while keeping an entire 80-billion-parameter model on a single die. It is the only approach that delivers SRAM-class proximity and DRAM-class capacity at the same time.

The cheapest gigabyte of HBM is the one you never have to buy.

PhantaField

Frequently asked questions

What is HBM (High Bandwidth Memory)?
HBM is a vertical stack of DRAM chips mounted next to a processor on a silicon interposer. Its wide, short connection delivers much higher bandwidth than standard DDR memory, which is why AI accelerators use it to feed model weights to compute cores.
Why is there an HBM shortage in 2026?
AI demand for HBM has grown several times over since 2023, while the advanced-packaging capacity needed to build HBM stacks has lagged. Suppliers report packaging lines at capacity, and because HBM uses far more wafer area per gigabyte than standard DRAM, it also tightens the supply of ordinary memory.
When will the HBM shortage end?
Industry reporting in 2026 suggests the imbalance lasts for years rather than quarters, with constrained supply and elevated memory prices expected to persist toward the end of the decade.
What are the alternatives to HBM for AI chips?
The main alternatives are large on-chip SRAM (very fast but small capacity, requiring many chips per model) and on-die DRAM via monolithic 3D integration, which puts DRAM-class capacity directly above the compute. The latter, used by PhantaField's Sophon, removes HBM from the design entirely.