Compute-in-memory (CIM) is a chip architecture that performs calculations inside, or immediately adjacent to, the memory that stores the data — rather than shuttling that data to a separate processing unit and back. The motivation follows directly from the memory wall: in modern AI the expensive part is not the arithmetic, it is moving the operands. If the multiplication can happen where the weight already sits, the most costly step — the data movement — simply disappears.
In a conventional processor, weights are stored in memory and computed in a physically separate datapath. CIM collapses that separation. Each block of memory gains the ability to multiply and accumulate the values it holds, so a matrix multiply — the core operation of a neural network — is performed in place, across thousands of memory tiles at once.
Analog vs digital compute-in-memory
There are two families of CIM. Analog CIM encodes values as voltages or currents and lets physics do the multiplication on the memory's bit lines. It is extremely dense, but it pays a tax: every result must pass through an analog-to-digital converter, which consumes area and power, and the analog values drift with temperature and device variation, hurting accuracy. Digital CIM keeps everything in exact binary. It performs the multiply-accumulate with logic — sense amplifiers and adder trees — so the arithmetic is deterministic and drift-free, at the cost of slightly larger tiles.
How digital compute-in-memory works in Sophon
Sophon uses pure-digital CIM. Each memory tile pairs a 256×256 DRAM subarray with a binary sense amplifier and an adder tree, driven by a bit-serial activation broadcast at 500 MHz. With 131,072 such tiles on a die, the architecture reaches 4,200 TFLOPS in FP8 and 2,100 TFLOPS in BF16 — with no analog conversion anywhere in the path, so the result is exact every time. Because the DRAM is grown in a monolithic 3D stack directly above the logic, the weights never leave the tile: this is compute-in-memory and in-memory storage in the same structure.
If the math happens where the data lives, the data never has to move.
PhantaField
Frequently asked questions
- What is compute-in-memory?
- Compute-in-memory (CIM) is a chip design that performs calculations inside or directly next to the memory storing the data, rather than moving data to a separate processor. It eliminates the data-movement cost that dominates AI workloads.
- What is the difference between analog and digital compute-in-memory?
- Analog CIM computes with voltages or currents on the memory's bit lines — dense, but it needs analog-to-digital conversion and suffers from drift and variation. Digital CIM computes in exact binary using sense amplifiers and adder trees, making it deterministic and drift-free.
- Why is compute-in-memory important for AI?
- AI inference is limited by moving model weights, not by arithmetic. By computing where the weights are stored, CIM removes most of that data movement, dramatically improving energy per token and bandwidth.
- Does Sophon use analog or digital compute-in-memory?
- Sophon uses pure-digital compute-in-memory: a binary sense amplifier and 8-level adder tree in every 256×256 tile, with no analog conversion, giving deterministic results at 4,200 TFLOPS FP8.