I am an unabashed advocate of heterogeneous SoC design. Have been for decades. It’s a system-level design approach that lacks the elegance and academic symmetry of homogeneous processing in exchange for a more efficient, bare-metal, hard-core approach to system design that pre-apportions tasks to tailored hardware best suited to execute the task. Moore’s Law has given us a bounty of on-chip computing resources and heterogeneous computing provides a deterministic path to producing hardware designs that maximize bang per buck per Watt. (For a closely related story, see Richard Goering’s Industry Insights blog post: “DAC 2012 IBM Keynote: Multi-Core Performance Growth Slowing, New Approaches Needed”)
With that said, heterogeneous SoC architectural design has been an ad hoc sort of endeavor and that sad fact leads to a lot of wasted design effort and a sad lack of design automation. Last week at the 10th annual ESL panel at DAC, we heard that the number one architectural design tool at this level is…the white board.
Not for long.
Today, AMD, ARM, Imagination Technologies, MediaTek, and Texas Instruments announced the formation of the Heterogeneous System Architecture (HSA) Foundation with the stated intent of promoting “an open, standards-based approach to heterogeneous computing that will provide a common hardware specification and broad support ecosystem to make it easier for software developers to deliver innovative applications that can take greater advantage of today’s modern processors.”
The vision is to create standards and specifications and to drive the creation of models that allow software developers to take better advantage of the processing resources available in heterogeneous SoC architectures. This effort will naturally influence the way SoC designers create heterogeneous hardware architectures.
I found this quote from the HSA Foundation press release to be telling:
“The HSA Foundation welcomes forward-thinking semiconductor companies, platform and OS vendors, device manufacturers, independent software vendors, academia and open source developers. Members of the HSA Foundation plan to deliver robust development solutions for heterogeneous compute to drive innovative content and applications with developer tools, software developer kits (SDKs), libraries, documentation, training, support and more.”
For more information about the HSA Foundation, click here.
The multi-core performance is slowing because it is largely dependent on the turn-around time in getting data from memory into the cores and back out again, and if you do it with SMP architectures you get screwed by having to maintain cache-coherency on top.
The (only?) solution is to distribute the processing into the memory such that no operation has long fetch and return paths, and (if poss) no cache-coherency is required.
Fortunately this is now do-able because we can die-stack the CPUs with S/DRAM using “Wide-IO” for communication.
The remaining problem is then that you have to program it a bit differently for optimum performance, but you’re in luck: I’ve worked that one out for you –
PS: Not sure why ARM never picked up on the CSP approach, it allows you to scale without having to use 64bit processing, and they seem to be somewhat behind the curve there. Which is ironic since the home of both CSP and ARM would appear to be Cambridge.