Friday, September 29, 2006

Coupling FPGA and Multicore

article in the register.
another article in the register.

"The end result will likely be a multi-core world where the common general purpose cores of today sit alongside FPGAs, networking products and other co-processors tuned to handle specific tasks at remarkable speeds."

Tuesday, September 26, 2006

80 cores, but what can we do with it?

Slashdot's been talking about multi-core again. Someone brought up FPGA in there too which is always nice. A lot of people wonder what OS support and software support for multi-core will look like. I wonder this regularly, and I think the answer is that we should start to treat multi-core processors not too differently from FPGAs.

I think in terms of amortized efficiency benefits, it is likely that the true benefits of multi-core will be "assembly-line" throughput benefits. It is OK to have a very long pipeline spanning 80 processing kernels if such a methodology substantially increases system throughput even at a cost to single request delay. Even many realtime systems will benefit despite these throughput / delay tradeoffs.

Working away on mapping graphs to a lattice....

Thursday, September 21, 2006

FPGA Computing Blog

Someone else agrees.

Went to the $100K team-building dinner yesterday. I should minimize the number of acronyms in my business plan. FPGA sounds scary. Accelerate does not. DVS sounds scary. Adaptive power management does not... yada.

Wednesday, September 20, 2006

Scheme to Hardware

I just found a paper discussing Scheme compilation to hardware. I have a fairly decent Scheme -> LUT compiler at this point, but there are tons of things I haven't gotten to yet. I hope to have the system ready to demo by the end of September.

Tuesday, September 19, 2006

A language for computing on a lattice of cells

Last semester I wrote some of the basic components of a lattice processing simulator. These past couple weeks have been full of digressions, but I'm back to coding 4-8 hours a day. I've been developing a Scheme Web Server and XMLHttpRequest handler which should enable web-based development for a fabric of cells (email me if you want the code in its current form). I had some problems with POST method, but I've since figured it out.

My current plan is to build up an interface that allows me to modify code and provides a clean graphical interface to a the system. After the development environment is working and I have an FPGA board exposed to the internet, I will host the system on an FPGA board to start coding the hardware scheduler.

I've also been cooking up the system to compile Scheme to LUTs. Scheme's "bit-string" abstraction is very useful for encoding LUTs. A brief summary example of the LUT definitions:

(define (lut configuration in)
(bit-string-ref
(if (bit-string? cofiguration) configuration (unsigned-integer->bit-string configuration))
(bit-string->unsigned-integer in)))
(define (lut2 configuration in1 in2)
(lut configuration (bit-string-append in1 in2))

(macro-define '(name configuration)
'(define (name in1 in2) (lut2 configuration in1 in2))
'((zero2 0) (nor2 1) (pbq 2) (pb 3) (pqb 4) (qb 5) (xor2 6) (nand2 7) (and2 8) (nxor2 9) (q 10) (pborq 11) (p 12) (porqb 13) (or2 14) (one2 15)))

I have written about 8 pages of stuff demonstrating how to compile functions to a hierarchy of LUTs using this abstraction with let to assign temporary names to internal pins. When these functional descriptions get merged with the lattice processing model I will be able to simulate state evolution and delay. I will need to add an algorithm to infer an n-input LUT for arbitrary n-input combinatorial functions in order to map to real hardware.

Sunday, September 10, 2006

Marketing Reconfigurable Computing

I've been thinking of ways to argue for FPGAs in the utility computing market. "Cost effective computing" is easier to sell now that multi-core offerings are actually "slower" (in terms of clock speed) than previous chips. Intel and AMD already have us all warmed up to the idea that having more cores is better than faster cores. This makes it easier to market FPGAs as compute elements.

I think FPGAs scale better than CPUs in terms of compute density and power efficiency. I also think theat they force a more scalable programming model by virtue of their architecture. Multi-core chips force the question is "how do we manage concurrent processes across a varying size field of processing elements." The answer is closely akin to an FPGA programming methodology. Not surprisingly, there isn't really a good programming methodology for multi-cores or FPGAs.

Most of the "von Neumann" bottlenecks are associated with the cache structures in a microprocessor. Since FPGA's primitive elements are memories I like to think of them as "smart caches" that can maintain data locality better than RAM and can have operations moved to the data instead of the other way around.

I am meeting with Krste tomorrow to get access to the BEE2 board. I will also put up a CVS on fpgaos.com soon.

Thursday, September 07, 2006

utility computing

The utility computing market is set to grow extremely rapidly in the next few years. A guy from Sun spoke to my 6.891 lecture last year saying he was spending his time trying to figure out utility computing. Can a group of MIT students start a company and beat the big players in this market?

In the utility computing market, the major objective of computing providers and consumers will be to boost GOPS/$ for specific applications. How will FPGAs find their way into such a market? Will FPGAs give that group of MIT students a competitive advantage?

Sunday, September 03, 2006

Enterprise Reconfigurable Computing

By enabling perfect substitution of computing products, virtualization enables the commoditization of the computing industry. High bandwidth connectivity enables the delocalization of computing hardware by allowing large amounts of data to be quickly stored and retrieved remotely.

Reconfigurable computers will win in settings where the costs associated with transistioning an application to a reconfigurable computer is low and the GOPS/$ that the reconfigurable solution provides for a the application is higher than other solutions. Virtualization may effectively make a reconfigurable solution easily compatible at a cost to GOPS/$. For some applications it may be worthwhile to tune the engine to maximize computation efficiency.

Friday, September 01, 2006

opportunities for a multithreaded software company

two articles caught my attention today:

The CTO of Intel gave a talk at Stanford last week looking to stimulate multithreaded software development. It is not surprising that Intel needs software to target it's multi-core offering. The problems for software developers is to design scalable systems so that as Intel cranks out more and more cores on their chips, we don't have to undergo a software revision each time.

Another article is about algorithms for determining sub-oceanic topology from seismic data. Allan Willsky of "Signals and Systems" fame, has been working for Shell Oil on ways of crunching lots of data to speed up the mapping of underwater salt deposits. Sounds like an opportunity for acceleration.