Wednesday, December 29, 2010

Kinect Hacking

I've been hacking the kinect for the last few days. This Little Red character was developed for an upcoming iPad interactive story from my company Tinker Heavy Industries.

We've done this with a few other characters. There are some issues with the joint orientation which means my arm goes up and her arm goes down...

Tuesday, November 23, 2010

Intel Atom, now with an Altera FPGA!

/. points to this article which says that Intel is putting together an Atom and an FPGA into a single package and selling it for somewhere in the $61-100 range in lots of 1000. This is Intel competing with ARM for the embedded space. We all saw the tighter integration of x86 and FPGA coming 5 years ago when hard PowerPC started finding their way into Xilinx V2 Pros. Achronix is already using Intel to fab their components. In a year or two, perhaps we'll start to see x86 and FPGA on a single die by Intel and then a few more years and we won't need the x86 anymore.

This is just starting to heat up.

I looked at the FPGA they put on board, the Arria II GX FPGA.

25,300 ALMs
63,250 equivalent logic elements
50,600 Registers
495 M9K memory Blocks
791 Kb MLAB Memory
4,455 Kb Embedded memory
312 18x18 multipliers.

I don't remember Alteras logic modules anymore (some sort of big 8 input logic thing), but this chip is like 1/20 of the size of big FPGAs on the market today, though it has an impressive supply of multipliers. Clearly going to be a champ for custom video codecs since it'll be able to do tens of billion of multiplies per second when you need it to.

Hopefully this strategy works and then Intel will experiment with Xeon + Achronix Speedster on a single die :)

PS: My little girl is walking:

Little Red also walks on the iPad too, but so does the Big Bad Wolf! (buy our letter tracing iPad game for kids!)

Wednesday, November 17, 2010

Ten Things to do with an EC2 GPU

Yesterday I was saying we'll see crypto cracking on EC2 with GPUs. In addition to writing it on this blog, I enthusiastically proposed it to friends who are into this sorta thing and of course we then discovered that someone already did it so we lose all novelty points now and should just go back to thinking about how to make money using cloud based GPUs.

I've been writing about this for five years already and I think EC2 with GPU is like an invitation for me to take my GPU accelerated spreadsheet work into the limelight. A good question is, does Amazon have the capacity to turn that $2.10 per teraflop per hour into $2100 for a Petaflop for an hour? Can they provide it with enough granularity to provide a petaflop for a minute for 2100/60 dollars? I also suspect that some data intensive applications will wish they had solid state disks and faster connections between nodes (as I understand it they have 10Gbps links now). How will we deal with streaming data sets to and from the server? I wonder if you might be reserving the GPUs even when you're not using the GPU, but just moving data to and from the server; this would be a wasteful use of resources.

So here's my list of ten things to do with an EC2 GPU instance.

1) Electronic Design Automation - Giving supercomputer resources to teams of 2 or 3 designers for electronic simulation and synthesis. We definitely want some FPGAs in the cloud for logic simulation and other non-GPU algorithms that FPGAs are good at like pattern recognition. FPGA tools have a way to go and the appliances will need a GPU cloud just to do P&R.

2) Render Farms - NVidia's bought Mental Ray probably knowing they would put Reality Server on EC2. Plugins for Photoshop and Maya are gonna be next; lower resolution screen captures

3) Video Games - A system that renders in the cloud and streams it to your mobile device? Obviously other people are doing this, but access to the Amazon takes the risk out for developers! We can have much much much better game graphics, but controller to screen round trip latency puts an even higher burden to get the computation done faster.

4) Financial Services - Remember when you got to take a 15 minute break after starting a big Correlations or Monte Carlo sim? I wonder if we can get low latency streams of market data and make a sandbox for high frequency traders.

5) Voice Recognition - You know when you're talking to those computer on the phone and it acts like it can recognize you? Not sure how to pipe that volume of data in though.

6) Face Recognition - Soon to be a facebook feature for sure: we can index your face and then auto-tag you.

7) Search for Extra-Terrestrial Intelligence - Actually don't.

8) Computation Molecular Dynamics - Folding on a cloud!

9) Crypto Cracking - If only you had some crypto worth cracking you now have a supercomputer! You could also factor some big numbers, isn't RSA offering money for factoring some big numbers?

10) Verifying Goldbach's Conjecture - We seem to be having trouble proving that every even number is the sum of two primes, but computers keep verifying ever increasingly larger number we throw a them.

$2.10 per teraflop per hour on a $3000 GPU which consumes about 10 cents of power per hour means there's plenty of room for profit and competition in the infrastructure side. I suspect some of the algorithms above can still be designed as services that take advantage of these economics. Unless cloud-based GPUs gets more competitively priced, I suspect anyone selling GPU-accelerated services with constant enough demand would want to use their own server and possibly only tap into cloud resources when they are in need of capacity.

Monday, November 15, 2010


Looks like you can get some NVidia Teslas in an EC2 instance now! It seems like Amazon is in the business of selling you a teraflop for $2.10 per hour. This is going to be a big competitive business. I presume I can get a petaflop for a few minutes for under a hundred bucks. Time to start making crypto-crackers!

Tuesday, October 26, 2010

First Product

In June, I started a company to make apps for kids. Now Tinker Heavy Industries has its first app in the app store:

Another educational app to teach spelling is currently awaiting Apple's approval. Our next app will be an interactive Little Red Riding Hood. We've done some work learning how to build touchable characters and environments. Here's a demo of me playing with Tinker, our multi-touchable ragdoll elephant (made in Unity):

Periodically, I find myself falling back to the usual musings of semiconductor physics and mathematics. Here's a ring oscillator. The PMOS width's probably need to be relatively bigger for it to be useful and I don't have any representation for the wells or a simulation including parasitic BJT's (my imaginary SOI process avoids latch-up by using magic!)

Naturally I've also been playing with Mandelbrot sets, bifurcation diagrams, audio synthesis, and Riemann Zeta function partial sums. I'll have to put up some videos of super nerd stuff later.

Here's Star's face 3-D scanned and imported into Unity!

Sunday, July 25, 2010

Career Change

Since 2007, I had been developing a PDP-11/70 emulator using Virtex-5 FPGAs for Quickware. The QED970 system is composed of several boards designed to be compatible with legacy hardware specifications. Designing a replica of a 1970's machine revealed a lot to me about the way it used to be; I can hardly imagine designing entire boards with LSI and MSI components to be a Floating Point Unit. An entire generation of computer scientists learned PDP-11 Assembly: now it's like Latin. So I gave an old soul to a new machine, where now?

For the few months before June I went traveling: first in Mozambique and South Africa for an Adventure and then in Israel to be with my family and apply for a position at Brightsource doing accelerated computing work. I was very impressed with the company when I visited last fall. I believe that Brightsource's approach to engineering better solar power plants will be successful at achieving grid-parity with coal after a few iterations of the design. This spring I went back to Israel fully intent on building a supercomputer for designing solar power plants.

I came back to the USA by way of San Francisco, to visit friends and go to a wedding. Instead of working on supercomputing, I have decided to move to California and make interactive stories and educational games for the iPad. This is a wide departure from where I thought I would be. The timing is right and I think anyone with a sense of "the next big thing" see a future where paperbacks will be quaint like vinyl, with an underground scene who "just miss the feeling of paper in my fingers." I'm starting the company with good people and we are going to make the highest quality children content available. In a few years the big heavy backpack full of school-books will be obsolete.

I fully intend to continue posting to this blog about accelerated computing and semiconductors. I haven't given up on developing a spreadsheet system that runs on GPUs and FPGAs. I just want to be able to fund it myself. I think that I'm also going to be more open about my accelerated computing ideas on this blog now too, I'm trading the fear that someone else will steal on of my ideas for the fear that I might never fully express them.

Tuesday, March 02, 2010

GPU Supercomputing Rundown

/. linked to this opinion article on AMD, Intel and NVidia in the next decade (full disclosure, I own shares in NVidia). Some of the opinions about OpenCL and CUDA are the same as I expressed in my post about Intel purchasing RapidMind and Cilk. Since I made that post, Intel has decided to abandon plans to release Larabee as a consumer GPU. This was not a big surprise, since Larabee could not compete with AMD's HD 5870 or 5970 products. The consumer GPU space is currently dominated in terms of teraflops per dollar and per watt by these AMD/ATI cards. NVidia still dominates the GPU computing market because their investment in CUDA. AMD was the first to support OpenCL, but OpenCL seems more and more like an also-ran API. AMD was also the first to support DirectX 11, but NVidia downplays this.

Adobe is going to support CUDA in their tools. I don't know how AMD is going to turn the GPU supercomputing market in their favor: hardware leads are not enough except perhaps in the highest of high-end supercomputing applications.

NVidia's move to make their new 40nm Fermi architecture with a behemoth 3B transitors has proven troublesome with yields reported in the single digits. Reports inidicate that Nvidia will disable disfunctional stream processors similar to the way IBM boosted yields of the Cell processor for the PS3 by enabling only 7 of the 8 SPE processors.

It seems like there will be an increasing trend towards defect tolerant designs and design-for-yield EDA tools; once upon a time, I had an offer to intern at IBM developing such a tool. A number of issues at smaller geometries arise such as variace in wire dimensions and gate-lengths affecting the switching speed. Also, with more components in a chip there is simply more that can be broken. Possibilities for yield enhancment include disabling cache regions, lowering individual core speeds, and disabling disfunctional cores entirely.

Thursday, February 04, 2010

More on SaaS and EDA

EDA SaaS has been a recurring topic on many of the blogs I follow. Harry (the asic guy) wrote about the Blooming of an EDA SaaS Revolution in his first post at Xuropa. He says that the revolution "depends on a confluence of critical technologies." He also writes that the coming revolution will level the playing field allowing smaller EDA firms to compete. The same economics of sharing model that creates SETI-like distributed computing, fab-less semiconductor companies, and open source software has the potential to impact the EDA industry in a big way. I suspect the effect of decreasing the barrier-to-entry for design tools will ripple into the broader electronics industry.

With SaaS, the fixed cost of high performance computing infrastructure transforms into a variable cost. For a lean team of engineers, this can decrease the cost of compute resources substantially: who needs a $10,000 server, when you will only need to use 5000 CPU-hours on it which costs half the price from a cloud provider with the added bonus of being able to use 100 CPUs simultaneously on demand. This kind of computing power can increase the throughput of large complex simulations and optimizations substantially. I am currently paying for software licenses and infrastructure, but I would much prefer an on-demand high performance computer to run my simulations.

SaaS providers do not necessarily need to host their own infrastructure and can outsource some or all of their infrastructure requirements to third parties. Communication latency, job-size and job-work are the major factors determining how closely coupled a distributed computation network must be. If the data transmission for each job is order N and the work to process this data is N^2, then the communication-latency factor becomes negligible as the data transmitted increases. Large correlation operations like SETI are practical over large scale distributed networks for this reason.

Raw high-latency compute power can be provided by a widely distributed cloud: instead of searching for ET, your PC can help me make timing. Certain companies will pay to use your machine for distributed computing projects. These companies may act as a middle-man broker leasing out computing power in schedule blocks to a SaaS provider. Certain SaaS providers, recognizing that their customers work at 2-6 AM, may purchase large blocks of cheap computing power in these hours in each time-zone for example. They will have to form agreements with other cloud infrastructure providers to handle cases of excess demand. This forms an interesting set of insurances and contracts among the infrastructure providers, SaaS providers, and end-users. For example, an infrastructure firm may have unreserved excess capacity which it leases off for short intervals at short notice for low cost. If an end-user requires a large supply of CPUs for a long interval of usage on short notice they will pay a premium for this. To simplify matters, a SaaS provider takes jobs from users who have purchased a certain number of CPU-hour "tickets" in advance. These tickets have expiration dates and exchange values depending on whether you want 3600-CPUs-for-a-second or 1-CPU-for-an-hour.

For the software providers, supporting hosted applications is different from supporting installed programs. Issues are much easier to isolate: since none of the users ever install the software, if something is broken with the user's experience, it is definitely the fault of the software provider and not a virus, out of memory error, or other hardware incompatibility. Suppose the EDA industry wanted to support Multicore, FPGA or GPU accelerators for certain tasks. If they hosted this accelerated infrastructure to provide their software as a service, then they wouldn't have to deal with hardware compatibility issues providing this software to third parties.

Product iteration can be much faster since new features can be tested without requiring users to install anything. Additionally, all bugs can be reported back to the developers without requiring any action from the user. And of course, you shouldn't force new features on people or you'll have the same negative response that happens whenever Facebook changes their layout or adds new features. Facebook users can't complain with their checkbook (join my "Leave Facebook when One Million People Join This Group" group).

Security policies of EDA end-users are frequently cited as the major barrier to SaaS, indeed the sole criteria to reject using a hosted provider model. If secrecy is an important factor for a design, it is unlikely that a SaaS tool will be used. This is not due to some fundamental security issue with hosting an application remotely instead of locally. This is mostly an irrational insecurity of users that think it is harder to break in to their locally hosted computers versus a remote facility. Indeed large remotely hosted virtual private networks may even have better security than anything locally hosted because a large infrastructure provider can afford to deploy more attention to the problem.

Still, because of this paranoia, FPGA design teams and tools will be the first to transition to SaaS. I suspect that FPGA design teams value secrecy less than lowering fixed costs and lowering simulation latency, and secondly because the FPGA tools are used by many more people than ASIC tools. Of course tools like a distributed digital logic simulation engine will be useful in both ASIC and FPGA design.

The effect of decreasing the barrier to entry for electronic design tools as well as decreasing the latency of a simulation will lead to more people making more designs. The availability of more reconfigurable logic designs increases the value of the FPGA in a superlinear way. The total value of programmable logic cores derives from the various ways they can be combined: thus the value of the FPGA grows superlinearly similar to Metcalfe's Law.

So how will this happen? A fairly clear vision is shared by enough people blogging about EDA SaaS. The EDA industry certainly must be paying attention to the cloud computing meme and what SaaS means for their industry. Unfortunately, the economic barriers to entry are only a small aspect of the problem facing the programmable logic market. FPGA tools need to be more accessible to non-expert users. My vision is a tool flow combining incremental synthesis with dynamic partial reconfiguration that lets you program your array just like a spreadsheet.