Thursday, May 21, 2009

ScicomP 15 Overview

ScicomP 15 kicked off May 18 in Barcelona with an emphasis on programming the next generation of massively parallel, more specialized, low power supercomputers. ScicomP is the users group for scientists and engineers who use and support the largest IBM supercomputers in the world. The meeting is co-located with the SP-XXL user group for those who administer and develop infrastructure support for the same IBM systems. The meetings are being hosted by the Barcelona Supercomputer Center (BSC).

The week begin with a full-day, hand-on tutorial on programming the IBM cell processor. The tutorial was conducted by IBM and BSC. The Cell Broadband Engine is found in the Playstation 3 and has been incorporated into HPC systems like Roadrunner at Los Alamos National Laboratory, which was the most powerful supercomputer in the world when it was unveiled last year. While Roadrunner is a very specialized architecture and very difficult to program, other Cell architectures promise to be more mainstream. Still, the basic Cell design - a PowerPC chip with attached Cell "accelerators" - is an emerging paradigm in HPC. Many believe the HPC architectures in the immediate future will be some combination of traditional CPU (with many cores) combined with accelerators, e.g. an attached GPU (Graphics Processing Unit). The traditional HPC programming paradigm - Fortran and C/++ using the MPI message passing API - have no native support for accessing these accelerators. A number of different programming paradigm are competing to offer this functionality, but there is no clear leader among the candidates: Cell CDK, CUDA, OpenCL, OpenMP, CellSs.

The groups also heard presentations from leaders at HPC sites that will soon be unveiling huge systems. Thomas Lippert gave a keynote presentation about science and systems at the Juelich Supercomputer Center, which is slated to host Europe's first Petaflop supercomputer, an IBM Blue Gene/P. There were also talks about the coming Sequoia system at Lawrence Livermore National Lab (LLNL) and the IBM Blue Waters project at the National Center for Supercomputing Applications (NCSA).

ScicomP brings together computational scientists and engineers interested in achieving maximum performance and scalability on all IBM HPC platforms, including POWER, Blue Gene, Cell, Blade, and hybrid architectures. At yearly meetings presentations are given by HPC users, HPC center staff, and IBM staff with a focus on sharing real-world experiences porting, tuning, and running codes on IBM supercomputers. Meetings are co-located with SP-XXL, the members-only user group for sites with large IBM systems.

IBM Updates

Blue Gene Update
Todd Inglett, IBM Blue Gene Software Development

IBM is only speaking publicly about existing systems, so the Blue Gene talk was limited to BG/L and BG/P releases.

Blue Gene/L is only having general updates, not anything really new.

A few new things are in store for BG/P. Among them are added authentication support so "submit" in HTC mode acts the same as mpirun, i.e. transparently launching jobs on compute nodes. HTC is "High Throughput Computing" mode, in which each core can run a separate binary image. So a 4,000-core system can run 4,000 different programs or independent instances of a single program.

Cores are now pinned to a process and CNK allows creation of more threads than cores, per a request from Los Alamos National Lab. Also, an application may use pthreads to create additional threads. Standard pthread synchronization primitives are supported and threads are assigned round-robin to cores.

Full binary corefile support is added. You can set an environment variable, BG_COREDUMP_BINARY, that specifies ranks that you want to produce a full corefile. Other ranks will still produce a lightweight core dump.

A /jobs/jobid directory on IO nodes contains info about running jobs.

Improved performance and support for Berkeley UPC, Chapel, Co-Array Fortran, and Titanium (a parallel Java) is introduced.

The release also brings support for cross-compiling autoconf. The system can be configured to "magically" launch on the target OS/architecture running on an HTC partition.

Updates to Python 2.6, gcc 4.3.2, and a patched Valgrind memory debugger are included.
All these improvement will be available in software release V1R4.

IBM Compilers
Roch Archambault
IBM Toronto Laboratory

The latest versions of the IBM compilers, XL Fortran 12 and C/C++ 8 support XL UPC on AIX and Linux, C/C++ for Transactional Memory for AIX, CDT C/C++ development toolkit for AIX, and an IBM Debugger for AIX (with Fortran support).

IBM compilers now have improved interprocess analysis optimization that will allow enhancements like inling of Fortran kernels, which are common in HPC, into C code. Support for OpenMP 3.0 is included.

Blue Gene and Cell compilers were mentioned, but not too much significant was discussed.

Added additional support for XL UPC is ongoing. IBM is building up a suite of test codes for compiler validation and optimization development. UPC currently supports either on-node (pthreads) mode or distributed (via IBM's LAPI message-passing layers) mode, but not mixed mode.
The Linpack HPL benchmarks on POWER 5 and Blue Gene perform almost as well as MPI code, but with source code about one tenth the size.

IBM plans to have alpha release for OpenCL this year.

IBM's pNext: The Next Era of Computing Innovation
Piyush Chadhary, IBM

Since IBM is being publicly mum about a number of imminent technologies, "PC" spoke about IBM's view of HPC in the more distant future.

PC's thesis is that HPC is at a tipping point. Up to 10-20 PFlops are possible using the current paradigm, then things break badly. IBM is focusing on a number of areas they are calling Software, Ecosystems and Collaboration, System-level Performance and Reliability, HPC Going "Mainstream," and Environmentals. How can IBM exploit these shifts to meet Performance and Productivity Requirements.

THere is a huge investment in software by IBM just to get where we are today. Future systems will require a mix of traditional HPC and emerging technologies and applications. At the coming core counts in the millions per system, components will fail and systems and software will have to deal with it. HPC is now being called "technical computing" and other names, indicating "everyone else" is moving into HPC. And finally, power consumption, data center space, cooling capabilities, noise are all important. Systems are already using in the neighborhood of 10MW, which becomes very expensive to operate. There will also likely be governmental actions that will influence system design.

In IBM land there are three basic systems: the highly productive POWER line of SMP/distributed memory systems, the highly scalable multicore Blue Genes, and hybrid systems such as LANL's Roadrunner. IBM is taking the lessons learned from running these three lines and use them as they move into systems in the next decade.

Leakage power on an idle chip is now greater than the power consumed while the chips are busy. Thus, having large, general-purpose chips where parts of it are idle at any given time, is extremely wasteful in terms of energy consumption. Thus a goal is to deploy processors and coprocessors that are always busy.

IBM is working to deal with RAS in hardware so can recover transparently from failures and have programs continue running with top performance.

Liquid is going to be used for cooling; it's more efficient for the data center.

High-k metal gates will reduce idle power leakage, which is "killing us." Up to 60% of power is being lost to this leakage in an idle chip. Enhanced eDRAM will allow putting L3 cache onto chip. 3D chip stacking brings them closer together, allowing better connectivity between the chips for improved access and power characteristics.

HPC Cluster develoment: Everything is converging on accelerators on chips to deliver on different application spaces. IBM investing heavily to new programming paradigms: PGAS for example. PGAS will work really well on new architecures because considerations for performance and usability of these languages are influencing system design. Support for OpenCL is also strong.