Over the next decade, computer scientists anticipate the world’s largest supercomputers will grow to millions of cores running as many as a billion parallel threads. Even personal devices will contain a hundred cores and perform thousands of concurrent tasks.
Such systems with the ability to run multiple parts of the same program at the same time – in parallel – on a massive scale will be necessary to solve complex problems like climate change and drug modeling as well as to crunch the exabytes of data our smart devices will collectively produce.
“Parallel computing is the only path forward,” said Andreas Olofsson, founder and CEO of Adapteva, in his talk about the $99 Linux supercomputer Parallela at Collaboration Summit earlier this year. But the industry, in large part, isn’t yet prepared to push it into the mainstream, he said.
Hindering progress is the absence of a high-level programming language, akin to Java or Python, for writing parallel code. Parallel programming is harder to learn than traditional serial programming and has remained a specialized skill that few developers are trained to do.
“It’s much easier to think of algorithms and processes as a number of steps, or a recipe. Everything moves through an order,” Olofsson said in a phone interview with Linux.com. A parallel program doesn’t have an order. “So in your head you have to keep a map of when things are going to happen. That really takes some training and a different way of thinking.”
The problem is even tougher when it comes to helping researchers with basic programming skills attempt to write parallel applications. That’s why it’s common in science and engineering for researchers to run many copies of existing serial applications at once to explore or simulate a phenomena or analyze a big dataset instead of writing a native parallel application, said Michael Wilde, a computer science researcher at Argonne National Laboratory and the University of Chicago. But that approach won’t scale for the million-core datasets of the future.
Bottom line: In order to reach new levels of supercomputing and advance scientific discovery, parallel processing needs to be more accessible to the scientists, engineers, and data analysts that need it. And for that to happen, the use of parallel hardware needs to be far easier for – and ideally transparent to – the programmer.
The Swift Solution
That’s why Wilde and fellow researchers at Argonne and the University of Chicago have developed Swift, a new programming language designed from the ground up for building parallel applications. Similar to a shell script, Swift allows a user to stitch together programs or high-level functions written in any other language, including scripting languages such as Python, R or MATLAB, or even already-parallel programs written in C or FORTRAN using MPI or OpenMP.
Swift plays a simple but “pervasively parallel” coordination role to create the upper level logic of more complex applications, Wilde said. “It makes it very easy to parallelize what we often call the ‘outer loops’.”
Highly parallel applications can thus be composed by gluing together serial algorithms because Swift creates the parallelism automatically at runtime, without explicit direction from the programmer. It does this by first encapsulating the applications that are called within a script as “functions” with uniform interfaces, and then applying automatic data flow, he said.
“This enables Swift to provide transparent distributed parallel computing (e.g. for cloud resources) by automatically marshaling the data that needs to get passed between applications in a way that native Linux shells like “bash” or scripting languages like Python or Ruby can not do by themselves,” Wilde said.
Using a “dataflow-driven” programming model, in which its runtime environment monitors the state of all data elements in the program, Swift decides which parts of a program can run in parallel and what tasks must wait for data to be produced by other parts of the code, he said.
“We’re trying to retain the flavor and power of traditional UNIX scripting but to give it a uniformity that allows for automated parallelization, distribution and error recovery,” Wilde said
A General Purpose Hadoop
Swift is an “intriguing” solution, Olofsson said, because it’s not an entirely new language. It supports existing frameworks to accomplish distributed-scale parallel computing, almost like a general-purpose Hadoop, he said.
But whether it’s Swift or another language, it’s most important for programmers to start learning and using something – anything – in parallel. A standard will emerge over time as more programmers learn parallel programming and converge on a language, Olofsson said.
“The challenge today is, ‘How do you make it so that parallel programming is as productive as Java or Python is?” Olofsson said at Collaboration Summit. “That should be the goal, and not the way it is where a few coding ninjas can make the parallel hardware really sing.
“It should get to the point where there are thousands of projects on github all written in parallel code,” he said, “running on any kind of hardware.”