Wednesday, September 05, 2007

A Replacement for the von Neumann Model

"What is a von Neumann computer? When von Neumann and others conceived it over thirty years ago, it was an elegant, practical, and unifying idea that simplified a number of engineering and programming problems that existed then. Although the conditions that produced its architecture have changed radically, we nevertheless still identify the notion of "computer" with this thirty year old concept."
John Backus, “Can Programming Be Liberated from the von Neumann Style?” Turing Award Lecture, 1977

Yesterday I submitted my Master's thesis to MIT's department of Electrical Engineering and Computer Science. In my thesis I present my answer to the problem: how do we program parallel architectures?

My answer is to use a spreadsheet as a replacement for the von Neumann model since modern hardware more accurately resembles an array of processing units than a single instruction stream executer. Since spreadsheets are already a well-established dataflow development paradigm, tools already exist to make spreadsheet specification extremely rapid and straightforward. Unfortunately existing spreadsheet applications aren't designed with the intent of being complete programming languages and they often treat circular reference like an error--though you can enable manual iteration in Excel (try making a counter, IIR filter and a RISC CPU in Excel as an exercise--F9 to iterate). To make the spreadsheets a better parallel programming tool I added multiple interpreter modes (asynchronous, blocking and non-blocking assignments like Verilog), behavioral abstraction, guarded atomic action macros and compilation to FPGA, GPU and Multicore.

My thesis title is "Compiling and Optimizing Spreadsheets for FPGA and Multicore Execution"

Here is the abstract:

A major barrier to developing systems on multicore and FPGA chips is an easy-to-use development environment. This thesis presents the RhoZeta spreadsheet compiler and Catalyst optimization system for programming multiprocessors and FPGAs. Any spreadsheet frontend may be extended to work with RhoZeta’s multiple interpreters and behavioral abstraction mechanisms. RhoZeta synchronizes a variety of cell interpreters acting on a global memory space. RhoZeta can also compile a group of cells to multithreaded C or Verilog. The result is an easy-to-use interface for programming multicore microprocessors and FPGAs. A spreadsheet environment presents parallelism and locality issues of modern hardware directly to the user and allows for a simple global memory synchronization model. Catalyst is a spreadsheet graph rewriting system based on performing behaviorally invariant guarded atomic actions while a system is being interpreted by RhoZeta. A number of optimization macros were developed to perform speculation, resource sharing and propagation of static assignments through a circuit. Parallelization of a 64-bit serial leading-zero-counter is demonstrated with Catalyst. Fault tolerance macros were also developed in Catalyst to protect against dynamic faults and to offset costs associated with testing semiconductors for static defects. A model for partitioning, placing and profiling spreadsheet execution in a heterogeneous hardware environment is also discussed. The RhoZeta system has been used to design several multithreaded and FPGA applications including a RISC emulator and a MIDI controlled modular synthesizer.


Raoul Duke said...

Hi Amir,

This is an interesting idea, using a spreadsheet as the input for a multicore/parallel compiler. I can see the advantages:

If we need an entirely new programming environment for parallel/FPGA, then it's good to use leverage something that's already familiar to people.

I can also see how the spreadsheet environment could allow for the creation of pipelined structures with memory buffering, though you didn't talk about pipelining in your post.

Some questions though:

1. How can you expect to have a single programming environment for multicore and FPGAs? Isn't this shooting a bit high?

2. Although a spreadsheet method of input would be expressive, it would be very difficult to then go back to and modify. Even simple spreadsheet applications become difficult for the creator to maintain. How difficult would a large design be to maintain?

Amir said...

The other niceness of a spreadsheet is that you can modify the circuit while it's running so there's literally no compile time in interpreted mode.

Pipelining is easily done with a "non-blocking assignment" interpreter as in Verilog. I wrote about memory-buffering forward pipelines with latency slack in my thesis, since this is how you "must" do audio with JACK. One of the interesting things I played with in my thesis was pipelining RISC emulators in a spreadsheet. You can actually copy the RISC core in Excel and form a pipeline between them. It feels a little magical when you get it to work.

1) A single programming environment for multicore and FPGA results from the straightforward global memory model and the metaprogramming technique used. Verilog and C both support reading-order blocking assignments and share roughly the same syntax. Emitting Verilog is a lot easier than multithreaded C though. To get high performance C on the Cell for example, you need to use the EIB to directly transfer data from core to core. Partitioning a spreadsheet to minimize communication overhead for multicore execution is a hard problem.

2) Unlike ordinary spreadsheets, I support an abstraction mechanism which lets you define reusable state-machines in a sheet. With appropriate design partitioning the whole thing feels like an interactive Verilog. You can also put comments, colors and borders in any cell for aesthetics.

Cute with the Raoul Duke pseudonym :)

Hasit said...

Hi Amir
Your idea seems interesting to me. However I dont understand it completely. Can you please explain it to me in simple words?
would appreciate.


Jakob Engblom said...

Amir, very interesting idea.

I can see it work for data processing applications, even if there are bound to be tricky semantical issues in converting data types and precision to fixed-point DSPs, limited-precision FPUs, etc.

It would seem pretty poor for control-style programs, though. But since I am a strong proponent of domain-specific programming tools, I think this is one interesting choice for people that usually do spreadsheets as their way to set up computations.

What is definitely needed is a compiler from Excel to a fast implementation on GPGPUs or general-purpose CPUs in clusters, so that programs prototyped in Excel can be run on regular computers with high efficiency. Just targeting FPGAs will make the threshold of entry so large that there will be no large body of users available to grow the idea.

Best luck!


Mike Warot said...

Hi Amir,
I think this is just the thing I need to get the BitGrid idea into something that a normal person can comprehend. I'll try to get your program running on my Windows XP laptop and see how it works.