Last week, I quoted Ann Steffora Mutschler’s article about the information that Micron has revealed about it’s 3D Hybrid Memory Cube. Now that I’ve got the paper Micron presented at last week’s Hot Chips 23 conference, I’d like to explain how the company plans to explode memory bandwidth using this 3D design approach. You see, there’s been a DRAM-bandwidth bottleneck building for a long, long time. DRAMs have gotten far bigger with huge parallel arrays of DRAM cells on chip, but DRAMs are essentially still limited to the bandwidth supported by their package. That’s why we’ve gone from asynchronous DRAMs with RAS and CAS signals to more structured DRAMs with synchronous interfaces that support ever-increasing memory bandwidths.
With synchronous DRAM interfaces, we’ve gone from DDR to DDR2, DDR3, and now DDR4. Early SDRAM modules were limited to about 1 Gbyte/sec of peak bandwidth. DDR2-667 modules jumped to nearly more than 5 Gbytes/sec; DDR3-1333 modules bumped the bandwidth bar to 10.66 Gbytes/sec; and DDR4-2667 modules—at some time in the future—will achieve 21.34 Gbytes/sec.
Micron expects its HMC module to achieve and exceed 128 Gbytes/sec. That’s at least 6x what’s expected of DDR4. The only way to do that is through parallelism. The first step in exposing DRAM parallelism through the HMC is to configure each DRAM die as a 16-slice device with sixteen independent I/O ports on each die. Here’s an image of the first-generation Micron HMC memory die showing the large number of I/O ports on each die:
Stack four of these DRAM die, connect them with through silicon vias (TSVs) and you get access to 64 DRAM slices in the stack. Each slice in the first-generation HMC device has two banks, so you actually get access to 128 banks. Here’s an illustration of the HMC module:
The Micron HMC design places a logic chip at the bottom of this stack of four memory die. The logic chip manages the slices and presents the rest of the system with multiple processor links that connect the system with the module’s 64 DRAM slices through a crossbar structure. Here’s a block diagram illustrating the HMC logic chip:
According to Micron, an HMC with four links can achieve a peak bandwidth of 160 Gbytes/sec and one with eight links can achieve a peak bandwidth of 320 Gbytes/sec. Keep in mind that this is one HMC module, with the footprint that’s essentially no bigger than the logic die at the bottom of the HMC 3D stack.
The Micron HMC project illustrates why memory is a killer 3D app. With the bandwidths made possible by employing the large number of I/Os made possible through TSV interconnect, much more of the potential bandwidth available from all of those DRAM banks that have been on memory die for decades can finally be brought out and used to achieve more system performance. This is especially important in a world that increasingly makes use of multicore processor designs. Multiple core processor chips have insatiable appetites for memory bandwidth and, as the Micron HMC demonstrates, 3D assembly is one way to achieve the required memory bandwidth. That’s why Cadence is supporting 3D packaging in its Silicon Realization offerings.