I rarely get to tell an in-depth System Realization story like this one. The development of the 44Tbyte, enterprise-class Skyhawk SSD starts with a clear picture of the objective—build an enterprise-class, solid-state storage server using commercial MLC (multi-level cell) NAND Flash memory—and unfolds, well, systematically from there. I heard this story yesterday during a private meeting with Skyera’s CEO, Rado (Radoslav) Danilak, and VP of Product Management Alessandro Fin at the Flash Memory Summit in Silicon Valley.
The choice of commercial-grade MLC NAND Flash is eminently sensible in this application once you look at this photo that I shot of the Skyhawk SSD during the meeting:
Fitting 44Tbytes of NAND Flash into that 1U, half-depth, rack-mount box calls for high density circuitry. The desire to sell the box at a low price for enterprise-class storage (see below) also drove the decision to adopt MLC NAND Flash memory for the SSD’s design.
However, there’s a very good reason why few enterprise-class SSD vendors use MLC NAND Flash memory in their products: raw MLC NAND Flash memory isn’t as durable as SLC (single-level cell) Flash and data integrity is paramount in enterprise-class storage applications. MLC NAND Flash memory is typically rated at 3000 P/E (program/erase) cycles while enterprise-class storage requirements demand on the order of 100K P/E cycles to achieve a 5-year product life. Most MLC NAND Flash is designed for use in USB drives and SD cards where durability is not the issue—cost and density are. Yet it was the relatively low cost and high density of MLC NAND Flash that drew Skyera to this type of memory for an enterprise-class storage peripheral.
Skyera decided that the way to extract enterprise-class performance and durability from MLC NAND Flash memory was to develop its own Flash-management algorithms. That decision led to the development of a proprietary NAND Flash controller for the Skyhawk SSD. The NAND Flash controller addresses write amplification, one of the biggest problems with SSDs based on NAND Flash memory. The additional P/E cycles caused by write amplification exacerbate the issue of limited MLC NAND Flash durability.
Another way to extend the durability of the MLC NAND Flash memory is to build in additional memory redundancy through RAID storage technology. However, Skyera looked at existing RAID algorithms from “leading suppliers” and discovered that these algorithms are all tailored for rotating storage media. Existing RAID algorithms have baked-in assumptions based on the interface, speed, and latency characteristics of hard disks and these existing RAID algorithms were not able to exploit the fine-grained nature of NAND Flash memory.
So Skyera developed its own RAID algorithms, which it calls RAID-SE. (The “SE” is for Skyera). The RAID-SE algorithms also went into the NAND Flash controller. These algorithms achieve RAID 6 reliability while reducing write amplification to extremely low levels.
One more technology went into the Skyera NAND Flash controller: compression. Data compression boosts SSD capacity and increases performance by reducing the amount of data traffic going into and coming out of the NAND Flash memory. Skyera developed its own compression algorithm tailored for the specific needs of enterprise-class SSD storage and implemented the algorithm in controller hardware because a software implementation wasn’t fast enough.
(Note: For an in-depth discussion of data compression and I/O bandwidth, see “Will your multicore SoC hit the memory wall? Will the memory wall hit your SoC? Does it matter?” in the Denali Memory Report.)
At this point, Skyera had a fast (high IOPS), high-capacity SSD based on commercial MLC NAND Flash memory. Using this SSD in a system presented the next bottleneck. The SSD could support a lot of storage traffic from multiple servers and the Ethernet ports linking the SSD to the enterprise system became a bottleneck so Skyera developed its own Ethernet switch and integrated it into the Skyhawk SSD design. The Skyhawk SSD’s proprietary Ethernet switch has forty 1Gbps and three 10Gbps Ethernet ports and it’s incorporated into the same 1U, half-depth box as the SSD appearing in the photo above. More important, the Ethernet switch is integrated into the SSD’s system design, which further boosts overall performance through some additional system optimization.
Skyera is selling the Skyhawk SSD for “under $3 per gigabyte – before compression and deduplication.” According to Danilak and Fin, if you compare that price with the cost of a RAID-based, enterprise-class, hard-disk storage system with a comparable network switch, you will find the price to be more than competitive. Skyera achieved this price/performance breakthrough using a systematic approach to System Realization, starting with the adoption of MLC NAND Flash (the underlying premise that makes the Skyhawk SSD cost-competitive with rotating storage), the development of the NAND Flash controller and the RAID-SE algorithm to overcome the limitations of MLC NAND Flash memory, and finally the addition of sufficient Ethernet throughput to unleash the Skyhawk SSD’s system potential.
This story plots the methodical analysis of a well-defined system problem and a step-by-step development of hardware and software to overcome the challenges presented by the problem. It’s a terrific lesson in effective System Realization and I’m indebted to Rado Danilak and Alessandro Fin for sharing the story with the EDA360 Insider.