Friday, March 28, 2008

Xilinx Favors Clusters to FPGA for Application Acceleration

My last post got 40x my usual traffic. Apparently Python Spreadsheets is a hot topic. It also looks like reconfigurable dataflow networks aren't popular under the moniker of a four letter acronym. When you call it a hardware spreadsheet, a 2-D tiled array of functional blocks is suddenly a whole lot more more accessible.

A few posts back I recounted the tale (and linked to papers that agreed) that the proprietary nature of the FPGA industry impedes reconfigurable computing research with locked-down binary formats and expensive/closed-source (un-study-able) EDA tools. So now let me create a little more controversy. I think it's about time FPGA manufacturers embraced their reconfigurable computing futures. This means that when a major FPGA manufacturer has an EDA problem requiring accelerated computation, they should go with the FPGA approach for accelerating that problem instead of (or at least in addition to) the Cluster or GPU approach. I'm talking about the SmartXplorer utility added to Xilinx ISE 10.1.

Here's my gripe: why is Xilinx promoting clusters instead of FPGA acceleration for it's ISE? Of course if you're an EDA company trying to deliver a product, you want to use tested technology for parallel computing, like a cluster. But it's not like there aren't examples accelerating FPGA Placement on FPGAs. And everybody using ISE has an FPGA already -- it's sitting there attached over USB JTAG waiting for a new bitstream to be placed and routed by my cluster -- but we don't have a cluster.

If nothing more, this sends a signal about Xilinx's stance on whether one should invest in FPGA accelerated computing or a cluster solution (if your problem is sorta similar to the types of problems an FPGA EDA tool might have). Now Altera has to release a GPU accelerated workflow for bragging rights.

Something really needs to be fixed here.

(edit on March 31, props to Acceleware for the GPU accelerated workflows)

3 comments:

Anonymous said...

Hi Amir,

This question about using FPGAs to speed place and route is one I remember being asked of Dave Bennett at this panel at SC07:

http://portal.acm.org/citation.cfm?id=1188530&jmp=cit&coll=ACM&dl=GUIDE

As I understand it, Dave developed the Xilinx place and route tools first time round in 1980s, and then more recently looked at how the tools could be reworked to allow for a faster design flow for the development of HPC algos on FPGAs with the CHiMPs compiler.

He dismissed the possibility of using FPGAs for place and route acceration, certainly for the time being. He spoke of the 100+ of different algorithms that run behind the scenes during place and route, and inpracticality of achieving FPGA acceleration for enough them to make it worthwhile. At least, that's my recollection.

Getting parallelism out of these place and route algos can't be easy. It's one thing do multiple runs in parallel, but I noticed that for a single run, Altera's software won't give you much more than a 20% speedup in the best case, no matter how many cores you throw at the problem:

http://www.altera.com/support/kdb/solutions/rd04022007_474.html

Cheers,

Robin

Anonymous said...

hmm... After reading my post, I have to add: Me fail English? That's unpossible!

Amir said...

Hey Robin,

I was an intern at Xilinx for Dave getting STSWM from Fortran to CHiMPsable C. I remember having a similar discussion with him when we were talking about what to massage through the compiler.

However, those of us running around the Wall Street conferences saying "you can augment your cluster with an FPGA accelerator for 10-100x performance" really want to believe it.

It's certainly true that we can accelerate Simulated Annealing and other Monte-Carlo methods on an FPGA: the Wall Street guys use thiese sort of method for pricing algorithms. It's also true that other system constraints may preclude FPGA acceleration for some systems using these algorithms.

I'm suggesting that the research investment required to get ISE running on an FPGA is the price of being the market leader in reconfigurable computing. The two modes to get there are 1) Open Source Everything and fund startups 2) Try to fund such a research project internally.

I think the first FPGA manufacturer to significantly open their architecture and workflow will spawn a new $1B-10B market as the reconfigurable computing standard. It's like the open 6200 series 10 years ago that spawned a lot of the original reconfigurable computing projects -- only now the technology is pervasive and advanced enough that we might actually start to eat off Intel's plate.

Anyway, Dave agrees that Xilinx needs to invest 100x of it's research budget in these problems -- otherwise Dave wouldn't have left his previous spot to be on CHiMPs.

Cheers!

Amir