For the last 3D Thursday blog post of 2011in the EDA360 Insider, I thought I’d take a flight of fancy and try to put as many of this year’s 3D IC concepts as possible together to see what we might get. I started thinking about the year’s major announcements and here’s my short list:
- Xilinx Virtex-7 2000T FPGA using interposers with 2.5D IC assembly techniques
- Micron/IBM Hybrid Memory cube (HMC) using 3D IC assembly techniques
- Finalization of the JEDEC Wide I/O memory specification
- The CEA-Leti/ST-Ericsson WIOMING project
I then started to think about designing my own version of the Micron/IBM HMC using standard Wide I/O memory chips instead of the specialized memory chips Micron has developed for the HMC. Those Micron memories incorporate 16 memory arrays—each with a separate I/O channel—onto each die. The HMC can therefore deliver a peak throughput of approximately 160 Gbytes/sec. It’s designed for high-performance computing applications, which is why IBM is interested in the technology.
In contrast, a Wide I/O SDRAM is designed for low-power and mobile applications with somewhat less performance. One Wide I/O SDRAM delivers about 17 Gbytes/sec of bandwidth through four memory arrays and four 128-bit interface ports. Not shabby compared to DDR memory DIMMs, but not HMC-class performance either.
Because I expect Wide I/O memory parts to go into high-volume production over the next couple of years due to the demand of mobile designs, I decided a thought experiment using these parts was in order to round out the year. So let me take you on my quick flight of fancy through a top-level conceptual design to see where this technology takes us.
First, I plan to use stock Wide I/O memories, which gives me 17 Gbytes/sec peak bandwidth from each Wide I/O memory chip (or memory stack, because I can stack as many as four Wide I/O die using 3D assembly techniques to get additional memory capacity but I get no additional memory bandwidth by stacking Wide I/O die). Four such memories or memory stacks will deliver about 68 Gbytes/sec of memory bandwidth—somewhat less than the HMC, but close enough for a thought experiment and plenty of bandwidth to make the experiment interesting.
To control each Wide I/O stack, I need four Wide I/O memory controllers (one for each of the four memory ports on the Wide I/O die). Conveniently, CEA-Leti and ST-Ericsson have just discussed such a controller design at the recent RTI conference on 3D design. Here’s a layout of one such controller developed by the partners. They employed the Cadence Wide I/O memory controller IP block.
If we put four of these controllers together and arrange them properly, we get a Wide I/O stack controller that meets the JEDEC-specified layout. Then we take four such stack controllers and arrange them on a chip to define a logic die that can control four independent Wide I/O memory stacks using 16 Wide I/O memory controllers with a resulting peak bandwidth of 68 Gbytes/sec.
To get that sort of bandwidth off chip, I’ve elected to use four PCIe Gen 3 x32 I/O ports, which gives me a peak theoretical bandwith of 128 Gbytes/sec (32 Gbytes/sec per port). I could use four PCIe Gen 3.0 x16 ports, for an aggregate bandwidth of 64 Gbytes/sec, but that seems a bit undersized to me. I don’t want the memory subsystem’s I/O bandwidth to be the limiting factor here, so I’ve used x32 ports.
Now I know we’re going to need some significant data-moving and –handling bandwidth on this chip, so I’ve added a 4-processor array of ARM Cortex-A7 cores to the design with a shared L2 SRAM cache on chip. I think the SRAM cache will need to be fairly large to act as a buffer between the Wide I/O SDRAM stacks and the PCIe ports.
The resulting chip layout (please remember, it’s a thought experiment) looks like this:
I have no idea if the aspect ratios of the on-chip blocks is correct, but this layout will do for a quick exercise. We might also be smart to add several DMA controllers, but that’s more than I plan to do in this initial thought experiment.
Note that I’ve placed the Wide I/O memory controllers and the PCIe ports on the chip periphery because I am not planning on placing the Wide I/O memories on top of this controller chip using 3D assembly techniques. Instead, I am planning on using the silicon interposer technology and the 2.5D assembly techniques pioneered by Xilinx with the Virtex-7 2000T FPGA. Also, I plan to minimize the assembly’s footprint by using both sides of the silicon interposer. The logic chip will not need TSVs. The silicon interposer will need TSVs. The Wide I/O memory die already have TSVs.
Here’s what the assembly might look like:
It looks like there are three Wide I/O SDRAM stacks, but there are four. The second stack beneath the interposer is located behind the one that’s visible.
The full assembly using standard memory die and a custom controller deliver a high-performance, high-capacity SDRAM assembly using many of the technologies introduced just this year. This would not be much of a thought experiment a year ago because it would look like a far-off dream but with 12 months of 3D developments, it now seems quite possible.
How would you improve this design?
Happy New Year!
–Steve Leibson