Author: Tina Gasperson
High-performance computing is an integral part of today’s nuclear testing environment because of the Comprehensive Nuclear Test Ban treaty, which 175 countries have pledged to uphold. The treaty prohibits any kind of nuclear detonation for testing purposes, so companies like AWE “test” nuclear weapons using high-powered computer simulations. In these simulations AWE supercomputers handle tasks across the spectrum of scientific disciplines, including chemistry, physics, and engineering. In doing so, they generate large amounts of data called visualizations.
Even though the U.K. government is AWE’s only customer, the company’s technology demands continue to rise, in part because the limits of visualization technology keep expanding. Ash Vadgama, team leader and principal scientist for AWE’s Clustered Supercomputing and Visualisation Group, wants to see the accuracy of his nuclear detonation visualizations continually improve. “Our key focus is on the safety, security, and maintenance of the existing system,” Vadgama says. “You’re trying to simulate as close to reality as you can.”
About three and a half years ago, Vadgama noticed a trend in high-performance computing toward the use of open source software, including Linux. As an experiment, he brought in a couple of small Linux-based systems to begin testing code on the unfamiliar operating system and to “try to understand all the challenges of getting simulation code to run on different platforms and operating systems.” Back then, Vadgama says, it was a learning curve made steeper by the relative immaturity of Linux. “Nowadays, a lot of high-performance computing vendors are moving toward Linux — they all have Linux delivery in their branded software,” he says.
Recently, AWE purchased a Linux Networx Evolocity visualization cluster system it has dubbed “Starlight.” The system runs on SUSE Enterprise 9.1 and makes use of the latest high-speed interconnect technology, including InfiniBand, 64-bit Intel Xeon processors, and PCI Express, the much-heralded third-generation scalable technology that is replacing PCI, PCI-X, and AGP.
Vadgama says AWE let Linux Networx “own the risky part” of system implementation, which he says was getting all the interconnect technologies cooperating with the Linux kernel. “They went through a number of versions before settling on the Enterprise version of SUSE, with a couple of patches. It was important for them to do that because we didn’t want to have the risk.”
Even though AWE’s trial of Linux is working well, the company is courting other vendors for a major upgrade of its main production system around the end of the year. The current system, an Intel Power3 cluster with 1920 nodes, called “BlueOak,” runs at 3 teraflops, but AWE wants to bump capacity to 30 teraflops. Vadgama is not totally convinced Linux Networx has the solution.
“It is an open competition between vendors,” Vadgama says. “The Linux Networx machine is kind of an experiment. We’re having a look at those technologies by using it as a development system and testing our models so that when we do upgrade at the end of the year, we’ll have specified it right. I picked Linux Networx because it was the best technology at the time.”
Vendors are being asked to use AWE’s in-house suite of benchmark tests and classified simulation codes in preparing their proposals. The submitted results will help determine which high-performance computing solution AWE will select. Regardless of who is picked, most systems of this kind cost at least 40 million British pounds.