Yesterday, at the RTI 3D Conference, Pascal Vivet from CEA-Leti and Vincent Guérin from ST-Ericsson unveiled a 3D IC project that represents a real Tour de Force of cutting-edge system technology. The quest starts with a key question: “What’s the right architecture for dealing with large numbers of processor cores and other processing resources?” The next question is: “What’s the right way of connecting main memory to large numbers of processors in a multicore processing engine?” The 3D IC project outlined yesterday by Vivet and Guérin provides one set of answers to those two questions and to many more.
The project included the development of a logic SoC, dubbed WIOMING (Wide IO Memory Interface Next Generation) that is designed from the ground up to be used in a 3D IC stack. The project’s objectives include creating 3D IC stacks with one WIOMING SoC stacked with an SDRAM, two stacked WIOMING SoCs, and two WIOMING SoCs with attached DRAM. This set of objectives led to a very specific design approach for the WIOMING SoC.
From a certain perspective, this 3D project started with the new JEDEC Wide I/O SDRAM memory spec. Coincidentally, Ken Shoemaker—who is vice-chair of the JEDEC 42.6 Low-Power Memories subcommittee—confirmed during a GSA 3D IC Working Group meeting held on Monday of this week that the JEDEC Wide I/O specification had just been finalized, approved by the directors, and would soon be published. The objective of the Wide I/O interface spec was to define an SDRAM interface that delivers twice the bandwidth of the LPDDR2 specification at the same operating power and to allow for thinner, smaller system form factors specifically through the use of TSVs (through silicon vias).
The finalized Wide I/O spec operates at 266 Mtransfers/sec using an SDR (Single Data Rate) protocol, which delivers 17 Gbytes/sec of data throughput via four parallel, independent, 128-bit memory channels (4.26 Gbytes/sec/channel). As such, Wide I/O memory devices make excellent system components for multicore systems, which have an insatiable appetite for data and bandwidth. (Note: The current WIOMING SoC design operates the Wide I/O SDRAM at 200MHz rather than 266MHz. Also, See Richard Goering’s take on Wide I/O in his Industry Insights blog by clicking here.)
Because of its suitability to multicore systems designs, the Wide I/O SDRAM interface became a key part of the WIOMING design project and the Wide I/O SDRAM controller blocks were the first parts of the SoC to be developed. One key Wide I/O SDRAM characteristic that defined WIOMING was the 3D TSV pitch defined in the Wide I/O spec. The Wide I/O spec calls for an array of interface pads using an array spacing of 40×50 µm. That specification defined the TSV pitch for the memory-interface part of the WIOMING SoC and, consequently, for the entire chip. The WIOMING SoC design uses 10µm TSVs with a 10µm keep-out area around the TSV.
The Wide I/O SDRAM channel controllers for the WIOMING SoC are based on Cadence Wide I/O memory-controller IP cores. The design team—consisting of engineers from CEA-Leti, ST-Ericsson, and Cadence—used the development of these Wide I/O SDRAM control blocks as “pipecleaners” for the project’s entire 3D IC EDA tool chain, which is based on Cadence EDA tools including the Encounter Digital Implementation (EDI) System, Virtuoso Analog Design Environment, and QRC parasitic extraction tool. The 3D design flow for the project looks like this:
Development of the WIOMING SoC’s memory-control blocks involved coupling the Cadence Wide I/O SDRAM controller with the appropriate PHY drivers/receivers and with some amount of ESD protection, routing the appropriate pins from the IP block to the appropriate TSVs within the controller block, and then hardening the entire block. This approach—hardening the Wide I/O memory-controller block—created a suitably sized “project within a project” so that the team could more quickly learn the requisite new 3D IC design skills. Here’s a plan view layout of the resulting memory-controller block with TSV array:
Four such blocks are used in the WIOMING design to handle the Wide I/O SDRAM’s four independent 128-bit memory channels. Each Wide I/O SDRAM controller block is separately addressable within the WIOMING SoC’s on-chip network.
The system architecture of the WIOMING SoC design is very ambitious and includes the following on-chip blocks:
- An ARM 11 host CPU core
- A multicore CPU backbone for implementing the 3GPP LTE protocol
- Five MEPHISTO VLIW DSP engines
- Two ASIP (application-specific instruction processor) engines
- Four Wide I/O memory controllers
- DFT test logic
- Four sets of heaters and thermal sensors for 3D power/thermal experiments
Target frequencies for the processors ranges from 350 to 400MHz.
The WIOMING SoC’s architectural design employs a packet-based, asynchronous NoC (Network on Chip) to link all of these blocks together on chip using a 2D mesh. The selected NoC uses a Quasi Delay Insensitive (QDI) protocol developed about 15 years ago at Caltech. (More details on the NoC here. ) The choice of an asynchronous NoC (ANoC) protocol for the WIOMING communications infrastructure is an important one because it effectively deals with the problems of on-chip variability increasingly seen in advanced processing nodes—a problem that only gets worse when trying to communicate over a NoC that stretches between die. This sort of NoC is called “GALS” for “globally asynchronous, locally synchronous.” (For more information on NoCs, see my EDN article “NOC, NOC, NOCing on heaven’s door: Beyond MPSOCs”)
However, the original ANoC was designed for a 2D mesh. A 3D IC stack requires communication in the third dimension as well. There are two ways to approach this challenge. The first is to redesign each NoC router to be a 3D router (North, South, East, West, Up, and Down). That is not the approach taken by the WIOMING designers. Instead, they decided to develop special TSV link IP blocks that are attached in strategic network locations to the 2D NoC mesh using the existing 2D routers. The resulting WIOMING system architecture therefore looks like this:
The 3D TSV NoC links (Up and Down) are shown in green. The four Wide I/O SDRAM controllers appear in red in the center of the system diagram. Other system resources are located on the periphery of the network. The NoC as implemented is capable of transmitting 550 MFlits/sec in within the plane of the SoC and 200 MFlits/sec through each of the four connecting TSV connections.
A system with three stacked WIOMING SoCs would therefore look like this:
Note in this diagram that there are four NoC connections between each 2D NoC plane. If each of the 16 2D NoC routers in the WIOMING 2D mesh had been implemented as a 3D router, there would be 16 connections between planes. The presenters said that this was a conscious tradeoff between network connectivity, bandwidth, and transistor count. Each TSV consumes silicon that might otherwise be used by thousands of transistors.
The WIOMING SoC was taped out at the end of October. Circuit testing and qualification will take place during Q1 of 2012 and boards with the 3D assemblies will start ramping up the following quarter.
For Richard Goering’s take on this announcement, click here.
For Jim Handy’s take on this announcement, click here.