Back in October during ARM TechCon, Freescale announced plans to produce a family of chips based on a platform built around a pair of asymmetric processor cores: the ARM Cortex-A5 and Cortex-M4. Freescale’s dual-core approach is similar in concept to the design of the NXP LPC4300 microcontroller series, which I discussed earlier this week. (See “Asymmetric, dual-core NXP LPC4300 microcontrollers split tasks between ARM Cortex-M4 and -M0 cores, cost $3.75 and up”) Tom Halfhill has just published an article in Microprocessor Report (subscription required) with much more information about this yet-to-be-named family of embedded Freescale processors and it’s clear that Freescale has something more than “dual-core” asymmetric processing in mind.
Here’s a conceptual block diagram of the planned but yet-to-be-named asymmetric multicore family, adapted from Halfhill’s Microprocessor Report article:
As you can see, Freescale’s vision for this new SoC family is not of a dual-core machine but something much larger. The application processor side (on the left) of this architecture contains not one but four ARM Cortex-A5 processor cores. That doesn’t mean that every member of this family of embedded applications processors would include four ARM Cortex-A5 processor cores. In fact, at least three such variants make sense with one, two, or four such cores. (Freescale has not yet announced products based on this architecture but plans to do so early in 2012.) The Cortex-A5 processor cores in this architecture are there to run an OS, such as Linux or Android, while the lone ARM Cortex-M4 core handles the real-time system I/O.
The same benefits of asymmetric processing apply to this Freescale concept as to the NXP LPC4300 family. Asymmetric architectures provide a natural partitioning of the hardware that guides you to a natural task partitioning. This divide-and-conquer approach is an age-old engineering technique for designing complex systems that traces its roots back to the way the Romans conquered geographic regions. You can do almost the same thing with symmetric processor complexes, but asymmetric designs assign appropriately-abled processor cores to relevant tasks. Note that the ARM Cortex-M4 processor core in the diagram above is cacheless (as are all ARM Cortex-M cores). Absence of a cache gives the ARM Cortex-M series of processor cores performance repeatability, which is what you want in real-time I/O and control systems. The cacheless approach works fine for embedded tasks that can easily fit in a reasonable amount of local memory but breaks down as soon as you need to make use of off-chip SDRAM due to memory-latency considerations. Hence the closely coupled SRAM associated with the ARM Cortex-M4 processor core as shown in the above diagram.
You can get a little more information about this Freescale processor family in this video from Engineering TV.
Lest you get the idea that asymmetric processing is some new, daring, untried concept, allow me to set my personal Wayback machine for 1977. To remind you, that’s a mere six years after Intel introduced the 4-bit 4004 microprocessor—the world’s first commercial microprocessor. In 1977, HP introduced a desktop computer called the HP 9845—based on a pair of asymmetric 16-bit proprietary hybrid microprocessors designed with 5-μm (5000nm) design rules. One processor, called the LPU (Language Processing Unit), handled the application processing. It ran a BASIC interpreter and user code written in BASIC. The other processor, the PPU (Peripheral Processing Unit), handled all of the I/O transactions.
Here’s a block diagram of the HP 9845 System Architecture:
The HP 9845’s two hybrid processors were nearly identical and were based on the HP 2116 minicomputer architecture, developed in the mid 1960s. The LPU hybrid incorporated three NMOS chips: the BPC (binary processor chip), EMC (extended math chip with BCD arithmetic acceleration to aid floating-point calculations), and an IOC (I/O chip). The PPU hybrid omitted the EMC as a cost-reducing measure because it had no floating-point tasks to perform. Thus the two processors in the HP 9845 desktop computer were asymmetric in a similar fashion to and for the same reasons as the way NXP has implemented asymmetric processing in the LPC4300 family and the way Freescale plans to implement asymmetric processing in its yet-to-be-named embedded processor family.
Does this divide-and-conquer strategy make software development a piece of cake? Not on your life. Although asymmetric multicore processing eliminates some processor-related resource conflicts, it doesn’t eliminate all resource conflicts. Software development teams can still get into trouble, as the following anecdote proves:
During the HP 9845 development, the HP software team was taking an evening class in operating-system design from a professor named Denny Georg from Colorado State University. I also attended that class. One evening, we were discussing the great bugaboo of multitasking software development: deadlock. At one point in the lecture just after introducing the topic of deadlock and semaphores, the following exclamation came from the back of the room:
That exclamation came from one of the HP 9845 OS developers. They were writing OS code and learning to write it at the same time. Each generation of software developers gets to learn the same lessons over and over again as their processor hardware generations mature to the appropriate level of complexity. It happened first with mainframes. Then with minicomputers. Then microprocessors. And now with processor cores in SoCs.
It’s not like deadlock was a newly discovered concept in 1977. Edsger Dijkstra was discussing software semaphores to protect shared resources at least 12 years earlier in 1965. However, deadlock becomes significantly more likely with shared resources in a multiprocessor system. In the HP 9845 block diagram above, shared memory enables many potential deadlock hazards and thus many opportunities for the software team to screw up.
So the benefits and advantages of asymmetric multiprocessing have been known for decades and the idea is now starting to catch on, significantly, in SoC design. We’ll all benefit—as long as we learn from the past and don’t deadlock.