Clock Concurrent Optimization: The Primer to the Primer—OR—Want to overcome some major functional hurdles to Silicon Realization and save a lot of power on your SoC at the same time?

Cadence has just announced the acquisition of Azuro, a Silicon Realization EDA company specializing in physical optimization, especially as it applies to clock trees. Azuro offers a 20-page White Paper on its Web site that discusses its technology, called Clock Concurrent Optimization. You can read the White Paper in all of its glory or you can read the following quick summary. Then read the White Paper for more in-depth discussion.

We make a lot of assumptions when designing chips. These assumptions usually take the form of abstractions. The more we abstract, the simpler the design becomes and the faster we can create a design. Two of the many important abstractions in the realm of Silicon Realization are the “perfect ground” and the “ideal clock.” We’re not going to discuss the perfect ground here. We’ve got more than enough to handle a discussion of ideal clocks.

The figure below is from the Azuro White Paper. It shows a design flow from RTL to a final chip layout.

In this flow, you’ll note that synthesis, floorplanning, initial placement, and physical optimization occur in the realm of the “ideal clock.” Ideal clock edges arrive at every flip-flop on the chip at precisely the same moment. There is no clock skew anywhere on the chip in this idealized world. Clock tree synthesis then attempts to make this assumption true by building a clock tree that delivers “ideal clocks” as closely as possible through schemes designed to equalize clock skew across the entire chip. As clock frequencies climb, this feat becomes harder and harder. Azuro’s premise is that our ability to make this approach work well broke at the 65nm node and that the problem has been getting worse with each new process node.

There are three flies in the clock-tree-synthesis ointment according to the Azuro White Paper. First, there’s on-chip variation (OCV). Our ability to precisely replicate features on chips from one part of the chip to the next degrades as we push litho geometries to smaller and smaller fractions of the wavelength of light used to expose each mask. In fact, the Azuro White Paper estimates that logic-path delays at the 45nm node vary as much as 20% across a chip just due to manufacturing OCV. This problem is especially acute for clock paths, which meander all over a chip from one end to the other.

The next fly in the ointment is clock gating, which is supposed to be a balm for too much power consumption. Gating clocks tends to increase skew by introducing more variation in clock paths. As the White Paper notes, the cure for this problem is to push the clock gates as close to the relevant flip-flops as possible. Unfortunately, that’s “the worst possible strategy for saving power.”

The final fly is rapidly increasing design complexity, which hits everything we do in nanometer Silicon Realization. In the case of clocking, the adoption of IP blocks creates additional complexity in the optimization of clock trees because one branch of the tree disappears into the IP block, which may nor may not be subject to further clock optimizations. According to the Azuro White Paper, you get “a dense spaghetti network of clock muxes, clock xors, and clock generators, entwined with clock gating elements from the highest levels in the clock tree where they shut down entire sub-chips, to the lowest levels of the clock tree, where they may shut down only a handful of flip-flops.” What you get looks something like this:

With all three of these flies buzzing around in the clock-optimization ointment, you start to see a lot of clock-timing variation across the chip. It gets worse as the process-node geometries shrink. Azuro has done some analysis of more than 60 commercial silicon designs and came up with this graph of the increasing amount of variation versus clock period for four different process nodes:

There are some unpleasant ways to deal with this problem. One way is to slow down the clock. Then the variation won’t be so large compared to the clock period. As unpleasant as that alternative seems, we do this all the time when dealing with corners. We always derate the clock to get any sort of chip yield. We just want to avoid derating the clock more than needed. OCV makes derating a very squirrely exercise.

The other unpleasant alternative according to the Azuro White Paper is to “spend months of manual effort carefully crafting a set of overlapping balancing constraints, typically referred to as ‘skew groups.’” Ugh. The operative word in that sentence is “months” as in “you don’t have them.”

The pleasant alternative, the one all Silicon Realization teams desire when confronting a problem like this is to get a pushbutton tool that makes the problem go away while not disrupting the flow that the team already knows and loves. And so that’s what Azuro’s Clock Concurrent Optimization tool does. It replaces skew-driven clock-tree synthesis with a physically driven clock-tree optimizer that makes decisions that balance clock and logic delays.

How? Go read the White Paper.

For more information on the technique, see the three documents located on this Web page: http://www.azuro.com/downloads/.

About these ads

About sleibson2

EDA360 Evangelist and Marketing Director at Cadence Design Systems (blog at http://eda360insider.wordpress.com/)
This entry was posted in EDA360, Silicon Realization and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s