The DDR4 SDRAM spec and SoC design. What do we know now?

DDR4 SDRAM is coming. JEDEC may not have released the final spec yet but Samsung made the first DDR4 memory chip announcement in January of this year—a 2133MHz device built with a 30nm process technology—and Hynix followed suit in April by announcing a 2400MHz device, also built with a 30nm process technology. Cadence announced a complete DDR4 IP package for SoC designers the same month. (See: “Memory to processors: ‘Without me, you’re nothing.’ DDR4 is on the way.”) Nanya “sort of announced” a DDR4 memory device when it appeared in their most recent quarterly report. So there’s visible momentum for the DDR4 specification already even if JEDEC has yet to roll it out.

At today’s EETimes Virtual SoC event Marc Greenberg from Cadence pulled back the veil on DDR4 a bit more. Here’s what he had to say.

First, even though we don’t have a final specification, some details are public. DDR4 SDRAMs will have double the maximum capacity of DDR3 SDRAMs. They’ll also have twice the maximum clock frequency. Like DDR3 SDRAMs, DDR4 SDRAMs will have an 8n prefetch (important for cache-line-filling operations) but a DDR4 memory controller must alternate or rotate between SDRAM bank groups for maximum SDRAM performance. That’s a new restriction.

The DDR4 I/O voltage has been reduced to 1.2V—DDR3 SDRAMs use 1.5V—so you can expect that the DDR4 SDRAMs will consume less power and energy than DDR3 SDRAMs simply from the lower operating voltage and from the more advanced process technology. However, Greenberg warned that some systems might not realize such savings due to architectural issues. In addition, DDR4 SDRAMs will not use stub-series terminated logic drivers. Instead, they’ll use pseudo-open drain (POD) drivers with Vdd terminations. DDR4 memories also have new features to improve signal integrity. They’ll use data-bit inversion (DBI, more on that below), on-chip parity detection for the command/address bus, and CRC error detection for the data.

Because of the higher maximum clock rate, DDR4 memories may permit a pin-count reduction for some SoC designs. How? At double the clock rate, SoC designs can get the same data bandwidth with 16 data bits clocked at 1600MHz (3.2 Gtransfers/sec) as DDR3 designs get with an 800MHz clock rate. However, there’s a design caveat or two. First, SPB (silicon, package, board) design for DDR4-3200 SDRAM is going to be considerably harder than for DDR3-1600 SDRAM. In addition, most memory experts predict that designs with multiple DDR4 DIMMs on each memory channel will not be able to work reliably (or at all) starting with data transfer rate considerably below the 3.2 Gtransfers/sec maximum. Similarly, DIMMs with multiple memory ranks on the board may also fail before the data transfer rate reaches 3.2 Gtransfers/sec.

There are a couple of possible solutions to these DDR4 signal-integrity challenges. The first and simplest solution is to allow only one DIMM slot per DDR4 memory channel and allow only single-rank DDR4 DIMMs. The problem with this solution is that it increases the number of SoC memory channels for a given memory capacity and thus drives up the SoC’s pin count, cost, and board-level real estate.

No one likes any of those consequences. Not at all. So an alternative solution is the use of load-reduced DIMMs (LRDIMMs) as shown in the following figure.

Now you may be familiar with RDIMMs (registered DIMMs) used in servers. RDIMMs have an extra register chip soldered to the board that stores and buffers the address/control information from the memory controller and distributes that information to the memory chips on the DIMM. LRDIMMs also buffer the data lines to present a single load to the memory controller even when multiple memory ranks are soldered to the DIMM. RDIMMs and LRDIMMs increase memory latency, so the DDR4 controller must be able to understand and accommodate this kind of buffering.

Finally, in the what-we-know category, DDR4 SDRAMs will stay with the 8n prefetch used for DDR3 memories but they will add an extra level of multiplexing so that the memory controller must manage traffic to and from the SDRAM even more carefully than before to extract maximum performance from the device.

Here is where we leave the known DDR4 world and enter into the realm of conjecture.

Although there are no public details on how DDR4 SRAM’s extra multiplexing level works, GDDR5 memory already employs a bank-grouping scheme with an extra level of multiplexing. GDDR5 memory adds new command timings that differ depending on whether successive commands address the same or different bank groups. These extra timings mean that a DDR4-optimized memory controller must be a bit more complex than the controller used for DDR3 memories. The controller needs better command scheduling and it must deal even more efficiently with high-priority memory commands. The Cadence DDR4 memory controller that was just introduced last month has several new features to accommodate the new complexities of the upcoming DDR4 memory protocol said Greenberg.

Here’s a table of enhancements made to the controller’s command queue to accommodate DDR4 requirements and maximize memory-subsystem performance:

One key feature here is a new command-prioritizing scheme that prioritizes DDR4 commands when they enter the command queue (like the DDR3 version of this controller) and then reprioritizes the commands when they’re about to exit from the queue, to be issued to the DDR4 memory. That part’s new. This new feature allows high-priority commands to go straight to the head of the command queue when they’re received, but controller can delay the command’s exit from the queue (and the issue of that command to the memory) until the target DDR4 memory page and bank are ready to accept that command. This capability reduces the impact of high-priority commands and helps to maximize memory bandwidth and throughput.

Another new controller feature is support for DBI. The following figure illustrates the problem:

The left side of this figure shows four consecutive data transfers. In the first transfer, all of the data bits are “1.” In the second transfer, they’re all “0.” As a result, all data bits change state from the first transfer to the next. This is a bad thing, especially at multi-GHz transfer rates. The effects of capacitive charge and discharge for all data lines at high speeds creates a problem called simultaneous switching output (SSO), which stresses the DRAM’s power-distribution system on the chip, in the package, and on the board. The next transfer shows a transition from all zeroes to all ones except for one data bit. Because of capacitive coupling, the data lines making the zero-to-one transition become aggressors that try to induce that lone holdout bit to also make the transition even though it does not want to do so. The fourth transfer exhibits a similar problem. All of the bits make a transition but one bit steadfastly wants to make the transition in the opposite direction. Again, it’s up against a number of aggressors.

The way to solve this problem, implemented in GDDR5, is to add a DBI bit. The right side of the figure shown above illustrates the same state transitions, but with the addition of a DBI bit. When asserted, the DBI bit indicates that the data bus should be inverted. The inversion state can change from transfer to transfer and it is changed to minimize the number of data-bit state changes and thus minimize the I/O switching current and the number of aggressor bits from one transfer to the next. Again, this is how it’s done for GDDR5 memory. The DBI method used for DDR4 SDRAM is not yet public.

With these and other changes to the memory interface specification, SoC designers will need a new tool set to add DDR4 memory interfaces to their designs. That’s why Cadence has introduced a DDR4 IP package and design kit now—because SoC designers preparing early designs that incorporate DDR4 memory need to start now. The Cadence DDR4 offerings include a DDR4-enhanced memory controller (based on the existing, configurable SDRAM controller Cadence obtained when it purchased Denali Software last year), hard and soft DDR4 PHYs, design kits for board- and package-level DDR4 design, and verification IP and memory models for DDR4 memory.

Over the next several years, we will see DDR4 SDRAM gradually enter and then take over the SDRAM memory market. It’s happened with the DDR, DDR2, and DDR3 SDRAM generations and there’s little reason to believe this won’t happen with DDR4 as well. Greenberg said to expect to see designs from early adopters who need maximum memory subsystem possible performance in 2013, early majority adoption for high-performance (but not the most bleeding-edge) designs in 2014, majority adoption in desktop and laptop PCs in 2015, and then pretty much total market penetration all the way down to low-cost devices by 2016.

Time to get going, isn’t it?

Update: “JEDEC releases more details about DDR4 SDRAM spec. Want to know what they are?”

About sleibson2

Principal Analyst Emeritus, Tirias Research
This entry was posted in EDA360, IP, Silicon Realization, SoC Realization, System Realization, Verification and tagged , , , , . Bookmark the permalink.

Leave a comment