Tuesday, August 29, 2006

FPGA Computing and Virtualization

Today was my last day at Xilinx. I fly away from the west coast tomorrow morning and head back to MIT to begin my year as a grad student. Hopefully I will emerge unscathed.

I've been thinking a lot about the implications of software virtualization to FPGA computing. I'm intrigued by the idea of implementing a hypervisor in a tightly coupled FPGA/CPU system. I imagine a system with an FPGA connected to multiple memory units and managing the data flow for the various active virtual machines. The software layer of the hypervisor instructs the FPGA to move data to the processor or to function accelerators within the FPGA. The FPGA would probably host the hypervisor agent in order to concurrently manage instructions from the various virtual machines. Instructions non-native to the CPU may either be executed within the FPGA or "decoded" in the FPGA and passed along to the CPU.

If a virtualization layer sits just above the OS layer consisting of a resource sharing scheduler and a dynamic load balancer then applications would be able to take advantage of fine-grained optimizations while running on a virtualized compatibility layer. The key to making a massively scalable operating system is providing mechanisms for high level agents to inherit from low level optimization. A method for agents to manage "costs" and resource sharing provides an elegant optimization strategy that spans across all granularity levels.

Tuesday, August 22, 2006

some ramblings about stuff

Starting to think out loud:

Taking advantage of paralellism is a necessary step for the advance of computing. Chip-multiprocessors (CMP) and FPGAs enable concurrent computation paradigms. CMPs allow for thread-level parallelism (TLP), where each core contains an instruction cache consisting of the current set of supported functions. Reconfigruable datapath arrays allow for networks of fixed functions to be interconnected through routers to take advantage of instruction level parallelism (ILP). FPGAs offer even finer granularity of control allowing a network of reconfigurable functions often enabling bit level parallelism (BLP) in addition to ILP and TLP.

The granularity of reconfigurability also has implications to data locality. If we have fine grained control over where we may define memory and registers in our array, then we may localize our variables near the operations that use them. Since memory bandwidth is the primary cause of the "von Neumann bottleneck," on-chip data locality provides a solution.

The cost of reconfigurability is the amount of area required per operation, which implies a lower clock frequency and higher power consumption per function when compared to a fixed implementation. Still it is often impractical to have a fixed-ASIC implemenation for all computing functions. Thus we are left to choose between a reconfigurabile chip and a general purpose CPU. A reconfigurable device can often levarage parallelism to achieve a decrease in total execution time and total power consumption over general purpose microprocessing units.

It may not be the case the a reconfigurable device wins over a general purpose CPU. If a CPU device is more suitable for an application it would be wise to use it. A mixed granular structure incorporating reconfigurable logic within an array of microprocessors can maximize performance by alleviating speed issues for explicitly single threaded applications that cannot leverage BLP, ILP or TLP.

My goal is to create a self optimizing operating system for such reconfigurable heterogeneous arrays. Compatibility with current computing paradigms is a primary objective to minimize barriers to adoption. To maintain compatibility the primary application will be a virtual machine server that manages reconfigurable hardware. The operating system seeks to minimize some cost function while executing some set of processes. This cost function should be based on an economic model of computation concerned with metrics such as power consumption or execution time. If real estate can be used to save energy and time then it has an implied value.

The system is managed by a network of agents. Agents are capable of receiving and responding to signals and objects. Signals and objects are related just as energy and mass are related--sometimes it is useful to look at things as waves, and other times as particles. The agents communicate with one another subject to the constraints of some physical environment just as the propagation of electromagnetic waves is constained by the physical hardware.

To understand the interactions of the network of agents with their environment, it is important to have a model of an environment. An environment consists of a set of state variables, a set of accessors, and a set of state evolution functions. State variables are the information we might wish to know about a system for example the temparature at a location, the configuration of a LUT, the contents of a register or the capacitance of a MOSFET. These examples demonstrate that state variables exist with different scope and different domains.

State variables that are static in scope are the physical constraints that may not be altered by the agents of our operating system. For example, the electro-magnetic constant, the length of copper between two transistors in a circuit, the dopant density of silicon, etc. State variables that are in constant scope provide a configuration layer. This configuration layer may be accessed by reconfiguration agents.

It is generally desirable for things to behave predictably so that we can constrain an environment model and adapt to an evironment. However, this does not mean that we may assume absolute determinism; we should provide a means for semi-static and semi-constant variables that permit some randomness. This will provide support for defective or faulty components.

There should be processes that can simulate a particular environment to allow for behavior predictions. There could also be feedback from some of the physical environment to monitor the system and control the system.

Methods for optimizing the cost function include:

process partitioning
resource sharing and recycling
dynamic load balancing
garbage collection
power and frequency scaling
place and route
defragmentation

These processes are "computationally invariant" which implies that they only alter the cost of execution while the system functionality remains the same.

Monday, August 21, 2006

AMD, Sun... Altera

A few weeks ago I wrote an entry titled "Intel, AMD, Sun... Xilinx." In light of a new press release, It would see a more appropriate title is: "AMD, Sun... Altera"

Tuesday, August 15, 2006

Achronix

Achronix plans to deliver FPGAs that run at gigahertz speeds over a wide range of conditions. They have limited information on their site, but they claim to be using some asynchronous design method, which begs the question: what is being clocked at gigahertz speeds? They got back some silicon prototypes in April and two days later announced a low-power initiative, so I wonder what the power consumption specs of that prototype looked like. They plan on supporting user-programmable speed and power, which has really interesting implications on an operating system.

In terms of economics GOPS/Watt may be a more important metric than GOPS/(Fixed Cost) especially if long device life amortizes fixed costs. GOPS/Watt is most critical in battery powered applications.

Anyway, according to my unified theory of reconfigurable computing, these kinds of devices live and die by software support so we'll just have to wait and see.


--Edit June 10, 2008:
Cornell's Asynchronous FPGA group has Technical papers about the research that lead to Achronix products.

Monday, August 14, 2006

Rapport Inc.

I picked up last month's Tech Review and I discovered this startup looking to make chips with 1000 8 bit cores. The RAMP Project and the RAW project are extremely relevant to Rapport. Looks a lot like they're taking the RAW chip commercial... I hope that they don't blow through all of their funding attempting to manufacture a chip without establishing a software development environment and a market first.

I approach reconfigurable computing as a software problem first and a hardware problem second. It's something I believe makes business sense, given that COTS reconfigurable chips already exist but no real software exists for them yet. Since no one in the FPGA world has come up with a software methodology for reconfigurable computing, it's hard to imagine how new hardware that looks not to dissimilar to an FPGA will somehow enable a new software paradigm. The Tech Review article hit on these issues and it looks like there are some Quicksilver alumni on Rapport's staff, so hopefully they'll leverage their experience in this area.

These chips look pretty cool to me. Hopefully I can "get my shit together" quickly enough and get a team of geeks together to make OS support.

Wednesday, August 02, 2006

FPGA Community

I just started fpgacommunity.com. Since only like 3 people read this blog, I suppose the community will be small :) This along with fpgawiki.com will soon be hosted on a different server (when I get back to MIT) and then I plan to spread the meme through mailing lists and usenet groups. I think if there was a tighter community of FPGA users and developers, launching fpgaos.com as a community open source project might actually be feasible.

A few people have commented on why I would want to make the FPGA OS open source. The primary reason is that closed source is counter-academic and creates a barrier to adoption by those who would only use open source (meet the Linux community). The major barrier to FPGA computing is the lack of awareness among computer scientists, and closed source software tools exacerbate that problem. Also, open source guarantees that the best developers out there will have access to improve the code.

It may seem that I'd be missing a huge economic opportunity by going closed source, but if I place more emphasis on advancing computer operating system technology, money will almost definitely find its way to creating new applications. Even though transparancy is the goal of the OS, there will be an awful lot of code that could be optimized for concurrent execution even after an OS is complete. This will certainly be a non-trivial task, that will greatly reward those with expertise.

The amount of coding required to build an operating system for a reconfigurable computer is also way too massive to consider building without the concesus and assistance of the wider community. If everyone "agrees" on the OS then everyone will use it too, which means the effort won't have been wasted.