Earlier this month at EDPS, Marc Greenberg, Director of Product Marketing for the Cadence Design IP Group, said that Wide I/O SDRAM memory was going to drive the earliest adoption of 3D IC assembly techniques. Not simply because Wide I/O is cool, but because Wide I/O SDRAM solves real and growing problems with the movement of instructions and data between CPU and memory.
Here’s what we are doing for CPU-Memory interconnection today:
On the left you see a parallel connection between memory and CPU. We’ve been using this type of connection since DRAM was first introduced in 1971, so we really understand it. In fact, we understand it so well that we’ve pushed the connection speed to stratospheric heights. DDR4 SDRAM will push CPU-to-SDRAM transfer rates on the PC board to 2133 Mtransfers/sec and possibly beyond. Designers of Z80 microprocessor systems working with 4-, 6-, and 8-MHz CPUs in the 1970s would be gobsmacked by those 21st-century numbers. Who knew circuit boards could or would ever reliably carry GHz waveforms back then? But we’ve now had four decades to learn and understand how to achieve these data rates. Transfer rates exceeding 1000 Mtransfers/sec are now routine.
However, even at these data rates, the cost is high. We need a lot of pins to shuttle the information back and forth between the CPU and memory using a parallel connection. For example, the LPDDR2 interface requires approximately 60 signal pins for a 32-bit interface. Three channels of 64-bit DDR3 memory require approximately 300 signal pins.
Because of the number of pins and routes needed on pc boards for parallel memory connections, some vendors are at least considering serial connections. One notable memory device going in this direction is the Hybrid Memory Cube. (See “Want to know more about the Micron Hybrid Memory Cube (HMC)? How about its terabit/sec data rate?”) Serial connections require fewer I/O pins and therefore fewer pcb traces. These high-speed connections are based on Serializer/Deserializer (SerDes) communications channels that now run in excess of 25 Gbits/sec. (For example, see “3D Thursday: How Xilinx developed a 2.5D strategy for making the world’s largest FPGA and what the company might do next with the technology”)
High-speed serial channels solve two big problems: they reduce the required number of pcb traces and package pins and they can span longer distances over a pcb than can parallel memory connections. The big problems with high-speed serial channels are the amount of power required to run SerDes transceivers at rates exceeding 20GHz and the amount of extra latency added by the bit-serialization of dense memory traffic. (For more discussion of these issues, see “Will your multicore SoC hit the memory wall? Will the memory wall hit your SoC? Does it matter?“)
So what’s the alternative? Take a page from the Southern California Low Riders: go wide and slow. Push the number of parallel I/O pins up and push the transfer clock rate down. That’s the philosophy of the Wide I/O approach. The current Wide I/O JEDEC standard defines a 512-bit data interface between CPU and memory. It requires some 1200 pins for signals, power and ground but it moves 100Gbits/sec across the interface using a 200MHz clock, delivering twice the bandwidth of LPDDR2 SDRAM at half the I/O power. (See “A possible roadmap for Wide I/O that leads to 2Tbps of SDRAM memory bandwidth—per device”)
Wide and slow.
However, Wide I/O SDRAM cannot work on a pcb. Too many pins. Too many pcb traces. To make Wide I/O work, you need to get rid of pcb traces, package pins, and wire bonds. You need 2.5D or 3D IC assembly. Then, the Wide I/O route makes sense. How much sense? Many good things happen to your signal physics when you adopt the Wide I/O SDRAM specification as you can see from this slide taken from Greenberg’s EDPS presentation:
While you increase the number of CPU-Memory connections by 10x, the capacitance per connection reduces by a factor of 6x, connection length improves 200x, and power improves 6x. These are compelling numbers to system designers. Compelling enough to investigate the real costs of using 3D IC assembly techniques.