Thursday, May 04, 2006

Methods for Reconfigurable Computing

The following is a revised version of an article I wrote last month. I've been narrowing down my problem description in order to put together a proposal for my Master's Thesis.


Methods for Reconfigurable Computing

There is some work to be done on the expressive powers of FPGA hardware description languages. The problem is that many EDA developers are only improving methods of hardware description. People view circuit emulation as the sole application of FPGAs instead of one of many possible applications. FPGAs are largely used to verify designs and they may possibly even end up in some embedded products. However this mode of operation has created inertia that must be overcome before we start seeing reconfigurable logic arrays in all types of devices. We need to start viewing the FPGA the way we view a sequential CPU: as a means to the execution of an algorithm. C is the "native" language of the sequential Von Neumann processor and it came after several iterations of language development. There isn't such a thing for FPGAs yet; Verilog and VHDL are more like the assembly languages of FPGAs, and they don't even support reconfiguration!

Any system designed to improve FPGA usage should target a wider class of problems. The problem of targetting an application to a grid of reconfigurable functional units is applicable outside the realm of FPGAs. An methodology for programming a network of distributed systems is extremely relevant for cluster supercomputing applications as well as Internet applications. Additionally, multi-core processors are swarming the marketplace and soon they may even incorporate reconfigurable logic for application acceleration and dynamic data routing. FPGAs already contain CPU cores and one can imagine that those CPU cores could contain reconfigurable logic in a circular fashion. Future hardware may be in the form of quantum dot arrays, photonic processors and magnetic spin logic. Practical systems using these technologies may only exist in the form of programmable cellular arrays (fabricated in a wet self-assembly process, no doubt).

Currently, not enough software exists to support the kind of structure that future chips and systems will have. What I view as the major issue facing the FPGA EDA space is one of identity. We should aim for more than creating tools for emulation and verification. We need to create tools that solve the more general problem of targetting arrays of reprogrammable devices. Ideally, the tools should run on any kind of hardware platform, be it an FPGA, a single CPU core, a 16 core processor or a distributed network of CPUs. Tools and languages do exist for developing parallel computing applications, but they are largely disjoint from the FPGA space. I plan to focus my master's thesis work on developing tools to implement such a system on an FPGA.

When I realized that this was primarily a software problem, I wondered why it hadn't been solved by software people. Then I realized that inertia is probably the reason. The majority of the computing world is heavily invested in Von Neumann architecture and is generally unaware of the problems in the FPGA realm. I grew up knowing how to code in C and Assembly, but I never learned Verilog until I took 6.111 as a sophomore at MIT. Even still, many computer scientists at MIT haven't even heard of FPGAs. CS students avoid the "Digital Death Lab" because it has a reputation of being extremely time consuming (it should be if you become impassioned by your project). The experience most EE students have with FPGA technology may look something like this: sitting in the 6.111 lab frustrating over EDA tools that "simulate correctly but then why isn't my circuit working." Then you have to wait 5 minutes for the Xilinx tools to program the chip after you think you've fixed what was broken.

As a lab assistant for 6.111 I regularly hear students complain about the tools; I could write an entire book about the types of errors that aren't really bugs in the design, but rather in the tools. So the current tools need improvement, but even still, the necessary improvement is a step away from design verification tools. Students still approach the FPGA in line with the current dominant paradigm: as a means to emulate an application specific digital system, not as a means to execute an algorithm. This is mostly because we do not have tools with the latter purpose in mind. Students certainly understand that for many applications we can gain massive speedups through parallelism and granularity and that there is a whole class of applications where a single FPGA can outperform a cluster of computers. The functionality to cost ratio is still prohibitive for many applications, but like most things in hardware, that is a matter of scale.

The FPGA market space is heavily invested in hardware description languages. This is because there are methods in place to translate HDL designs to an ASIC implementation and FPGAs are mostly used for design verification. Verilog is based around module "building blocks" with behavioral descriptions and connections with other the modules. This "structural design" method is good for FPGAs and lends itself to a functional programming languages. Since I particularly like Scheme, I'm going to use it for the majority of my coding.

I also think it is simpler to write interpreters and translators in Lisps than in any other languages. If I can command the FPGA from Scheme, making it happen from C, C++, Java, or any other language is a SMOP (a great way to learn a language is to create an interpretter for it!) This is the case for me and probably for all MIT students who have all written a Scheme evaluator in Scheme for the introductory computer science course. If I want to utilize MIT computer science talent on an FPGA programming language project, Lisp is definitely the way to go.

A large part of the problem with HDL tools is that semantic eloquence is vacant from hardware description. Machine design through explicit module definition is not the best way to solve massive and complex problems; Verilog becomes extremely bulky and debugging systems can take an extremely long time. In my 6.111 project I was drawn to the idea of using multiple parallel datapaths and creating an assembler to assign functions to each of the datapaths. I later discovered that this was not an uncommon practice. We would want to go a level higher than this to solve bigger problems in less time: we want our system to spawn specific datapaths for a particular algorithm in order to maximize throughput. The system should generate optimal hardware structures from a high-level algorithm description.

Recursive functional programming allows for us to think of massively sized machines unfolding to solve a problem; often times we would unfold machines much larger than an FPGA could emulate. If we are at 99% resource utilization we may still map our problem to the FPGA, but once we go to 101% our entire design fails. There is currently no good way to implement a system that pushes and pops state and configuration data. We want to be able to dynamically spawn modules and alter the connection between modules to enable the architecture implement systems that are much more complicated than could fit statically on the FPGA.

I recently read a paper about function multiplexing on an FPGA. The idea is that each configurable unit may have multiple configurations. We could utilize this to dynamically transition a module between functionalities: while the FPGA is in configuration 0, we reprogram configuration 1. As soon as configuration 0 has completed its execution, the system switches to configuration 1. We then reprogram configuration 0 and so forth. If we have more than 2 configuration spaces we can even imagine data directed, conditional reconfiguration in which we choose between multiple next configurations based on the state of the system at some branch point.

To do this we would an operating system to manage the resources of an FPGA based computer. Such an operating system should work symmetrically with distributed computing systems and multi-core CPUs. The scope of this problem is much larger than I can accomplish on my own. It will be incredibly important to look at what's already been done and to devise a methodology and a taxonomy that can easily be adopted by the community of FPGA developers.

1 comment:

Anonymous said...

Hmmm... I have been mulling over similar ideas for quite a while... more in an HDL framework. There are surely business opportunities in the frameworks (and in the inertia) that you describe as well, although some pretty smart people have been looking at reconfigurable FPGA computing for years. You are right that networking in this huge field is essential, and market research is a key initial step in any venture.

FPGAs are widely used for applications as well as ASIC protos. Decent EDA tools coupled with proven development methodologies should give satisfactory results.