Every time I see someone claiming they've come up with a method to make parallel programming easy, I can't take them seriously.People you don't take seriously may take you by surprise. I think the computing industry suffers from the "faster horse" problem: Henry Ford couldn't ask his customers what they wanted because they would have said "a faster horse." The instruction stream is a horse: the industry has already built the fastest horses physically possible, so now the industry is going on a multithreading tangent (multi-horsing?).
While everyone else is still in a horse-race, we're building hybrid engines.
The HPCwire article lists a whole bunch of languages, but whenever someone groans about parallel programming and provides a list of languages, Excel is always left off. Clearly we aren't seeing the forest if we leave out the most widely used language ever (probably by a double or even triple-digit factor). The omission is especially egregious when these lists include other visual dataflow languages like Labview and Simulink (this article mentions dataflow style but none of these particulars). Spreadsheet cells are explicitly parallel and make dataflow and vector programming so simple that almost everyone who has ever used a computer has done it. There's even a well-understood model for event-triggered control-flow macros for those cases where you "need" instruction streams.
So I strongly disagree with the premise that parallel programming aught to be difficult. Parallel programming is the same as spreadsheet programming, it's easy to do and everyone knows how it works already. Especially don't let someone convince you that parallel programming is hard if they work on the hardware-software interface. Many of these people still believe parallel programming involves synchronizing random-access-machines running non-deterministic threads (avoid the pitfalls of horse-race-conditions by considering threads harmful).
Developing a high-performance, real-time spreadsheet framework for a hybrid hardware topology requires substantial effort. Depending on your target hardware architecture, you may need to use threads, vector operations, distributed processes, and hardware description languages to iterate that spreadsheet efficiently. To do this, you need a compiler from the spreadsheet language to each of the hardware models you want to support and you need to generate synchronization code to share precedent cell data across hardware elements. Depending on the memory and interconnect architecture this data synchronization code can get somewhat tricky, but code generation from a spreadsheet is the "tractable" part of the parallel programming problem and makes for good Master's theses if you throw in at least one optimization.
For your PhD you'll have to do something more difficult than just automatically generating parallel code from a partitioned dataflow graph.
Optimal partitioning of parallel programs is obscenely hard (MegaHard as my previous post would have it). In heterogeneous environments that use many of these primitive parallel models, you need to worry about optimally partitioning which cells run on which metal. Partitiong based on computational resources is a pain, but the real difficulty is optimizing for the communication requirements between partitions and the communication constraints between the hardware elements. We are approaching the optimal partitioning problem by assigning a color for each chunk-of-metal. We group spreadsheets cells by color and then profile the computational load of the color group and the communication between color groups using hardware constraints
The HPCwire article does mention communicating sequential processes and dataflow models:
"Until we have machines that implement these [communicating sequential processes and dataflow] models more closely, we need to take into account the cost of the virtualization as well."We do have machines that implement these models (as Tilera and Altera will attest). They are still as difficult to program as any parallel architecture, but I assure you that once we start to think of these things as "hardware spreadsheets" we will start to see a way out of the parallel programming cave. I wonder if people who describe an FPGA as a "neural-net processor" make the pop-culture connection: