Want to know more about the Micron Hybrid Memory Cube (HMC)? How about its terabit/sec data rate?

Last week, I quoted Ann Steffora Mutschler’s article about the information that Micron has revealed about it’s 3D Hybrid Memory Cube. Now that I’ve got the paper Micron presented at last week’s Hot Chips 23 conference, I’d like to explain how the company plans to explode memory bandwidth using this 3D design approach. You see, there’s been a DRAM-bandwidth bottleneck building for a long, long time. DRAMs have gotten far bigger with huge parallel arrays of DRAM cells on chip, but DRAMs are essentially still limited to the bandwidth supported by their package. That’s why we’ve gone from asynchronous DRAMs with RAS and CAS signals to more structured DRAMs with synchronous interfaces that support ever-increasing memory bandwidths.

With synchronous DRAM interfaces, we’ve gone from DDR to DDR2, DDR3, and now DDR4. Early SDRAM modules were limited to about 1 Gbyte/sec of peak bandwidth. DDR2-667 modules jumped to nearly more than 5 Gbytes/sec; DDR3-1333 modules bumped the bandwidth bar to 10.66 Gbytes/sec; and DDR4-2667 modules—at some time in the future—will achieve 21.34 Gbytes/sec.

Micron expects its HMC module to achieve and exceed 128 Gbytes/sec. That’s at least 6x what’s expected of DDR4. The only way to do that is through parallelism. The first step in exposing DRAM parallelism through the HMC is to configure each DRAM die as a 16-slice device with sixteen independent I/O ports on each die. Here’s an image of the first-generation Micron HMC memory die showing the large number of I/O ports on each die:

Stack four of these DRAM die, connect them with through silicon vias (TSVs) and you get access to 64 DRAM slices in the stack. Each slice in the first-generation HMC device has two banks, so you actually get access to 128 banks. Here’s an illustration of the HMC module:

The Micron HMC design places a logic chip at the bottom of this stack of four memory die. The logic chip manages the slices and presents the rest of the system with multiple processor links that connect the system with the module’s 64 DRAM slices through a crossbar structure. Here’s a block diagram illustrating the HMC logic chip:

According to Micron, an HMC with four links can achieve a peak bandwidth of 160 Gbytes/sec and one with eight links can achieve a peak bandwidth of 320 Gbytes/sec. Keep in mind that this is one HMC module, with the footprint that’s essentially no bigger than the logic die at the bottom of the HMC 3D stack.

The Micron HMC project illustrates why memory is a killer 3D app. With the bandwidths made possible by employing the large number of I/Os made possible through TSV interconnect, much more of  the potential bandwidth available from all of those DRAM banks that have been on memory die for decades can finally be brought out and used to achieve more system performance. This is especially important in a world that increasingly makes use of multicore processor designs. Multiple core processor chips have insatiable appetites for memory bandwidth and, as the Micron HMC demonstrates, 3D assembly is one way to achieve the required memory bandwidth. That’s why Cadence is supporting 3D packaging in its Silicon Realization offerings.

About sleibson2

EDA360 Evangelist and Marketing Director at Cadence Design Systems (blog at https://eda360insider.wordpress.com/)
This entry was posted in 3D, EDA360, Packaging, Silicon Realization, TSV and tagged , , , , . Bookmark the permalink.

2 Responses to Want to know more about the Micron Hybrid Memory Cube (HMC)? How about its terabit/sec data rate?

  1. Glenn G says:

    Steve — thanks for this article!
    Is it possible that this approach will eventually reduce the need for huge processor L2 and L3 caches?
    By the way, did Micron’s paper say anything about the “processor to HMC link” design? Are these similar to PCIexpress Gen3 (8Gbit/second), or do they run faster?

    • sleibson2 says:

      Glenn,

      My impression of the HMC’s link design was that it was proprietary. I suspect it will be leaner than PCIe because it need only be a memory interface. Because of DRAM’s long latency, I do not think that the HMC technology will supplant SRAM cache. Even with an 8x throughput improvement, latency is still a killer for caching.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s