If you follow this blog, then you've read my ramblings on EDA SaaS. An interesting new advance in this area is http://www.c-to-verilog.com. This website let's you compile C code into synthesizable Verilog modules.
Here is a screencast of the service:
I've also discovered a few other websites related to EDA, SaaS and "Cloud Computing." Harry the asic guy's blog covers the burgeoning EDA SaaS market and Xuropa is creating an online community for EDA users and developers. Here's a two part EETimes piece which talks about EDA SaaS.
I already use a remote desktop connection to run most of my EDA jobs remotely. I've argued that there is a "generation gap" between SaaS and traditional software license markets. The people who used to code in university basements when they were 13 in the 60's and 70's invented the software licensing industry. Now, the people who used to herd botnets when they were 13 are graduating with computer science degrees. The nu-hacker distributes code updates to all his users immediately without forcing anyone to wait through an install.
Monday, December 29, 2008
Friday, December 05, 2008
More versus Faster
/. points to an IEEE Spectrum article title "Multicore Is Bad News" which discusses the bottleneck associated with getting data from memory into the many cores of a multicore processor.
This is primarily an issue with our programming model we are trying to force on the hardware and not a problem with the real capacity of such hardware. The article specifically says that many HPC processes map cleanly to grids thus address the locality problems of multicore arrays. I'm a broken record: the problem of locality optimization for mapping dataflow graphs to multicore arrays requires a transition of our primitive computer model from instruction stream executers to spreadsheet iterators.
The article sums up the problem: "Although the number of cores per processor is increasing, the number of connections from the chip to the rest of the computer is not." This is not necessarily true, since we are seeing increased number of pins on a chip, but it is also not the main issue.
Even without increasing the rate of dataflow through the processor, we can drastically improve power-performance by slowing down our processing pipelines and distributing load across multiple slower cores. One-thousand 1 MHz cores can perform many of the same tricks as a single 1 GHz core for far fewer Joules. This physical truth is starkly contrary to the popular notion that we should increase utilization to improve performance. Since Moore's law says that transistors for ever more cores will continue to cost less and less (and I predict this will continue long after dimensional scaling ends, due to optimizations in the fabrication methods), we can achieve a decline in operating expense to sustain our growing hunger for computing resources. This requires us to fundamentally change our approach: we cannot continue to expect faster execution, we can only expect more execution. Exponentially more.
Once the industry discovers that it has to transform from isolated general purpose problem solvers to networked special purpose assembly lines for a particular problem, I expect we will seem the rise of FPGAs and reconfigurable dataflow pipelines. Any application that can use many thousands of CPUs effectively will probably use many thousands of FPGAs an order of magnitude more efficiently. Its becuase of the relationship between the granularity/number of cores and the degree to which we can optimize for locality. All the extra dispatch hardware and cache management logic is unnecessary in deterministic dataflow pipelines. We simply need more dense functional area and more pipes for data.
This is primarily an issue with our programming model we are trying to force on the hardware and not a problem with the real capacity of such hardware. The article specifically says that many HPC processes map cleanly to grids thus address the locality problems of multicore arrays. I'm a broken record: the problem of locality optimization for mapping dataflow graphs to multicore arrays requires a transition of our primitive computer model from instruction stream executers to spreadsheet iterators.
The article sums up the problem: "Although the number of cores per processor is increasing, the number of connections from the chip to the rest of the computer is not." This is not necessarily true, since we are seeing increased number of pins on a chip, but it is also not the main issue.
Even without increasing the rate of dataflow through the processor, we can drastically improve power-performance by slowing down our processing pipelines and distributing load across multiple slower cores. One-thousand 1 MHz cores can perform many of the same tricks as a single 1 GHz core for far fewer Joules. This physical truth is starkly contrary to the popular notion that we should increase utilization to improve performance. Since Moore's law says that transistors for ever more cores will continue to cost less and less (and I predict this will continue long after dimensional scaling ends, due to optimizations in the fabrication methods), we can achieve a decline in operating expense to sustain our growing hunger for computing resources. This requires us to fundamentally change our approach: we cannot continue to expect faster execution, we can only expect more execution. Exponentially more.
Once the industry discovers that it has to transform from isolated general purpose problem solvers to networked special purpose assembly lines for a particular problem, I expect we will seem the rise of FPGAs and reconfigurable dataflow pipelines. Any application that can use many thousands of CPUs effectively will probably use many thousands of FPGAs an order of magnitude more efficiently. Its becuase of the relationship between the granularity/number of cores and the degree to which we can optimize for locality. All the extra dispatch hardware and cache management logic is unnecessary in deterministic dataflow pipelines. We simply need more dense functional area and more pipes for data.
Subscribe to:
Posts (Atom)