Skip to Main Content U.S. Department of Energy
Website title

SC09 PNNL Leadership in Technical Sessions

Papers Session

Scalable Work Stealing

Dynamic Task Scheduling

Thursday, 4:30 p.m. – 5:00 p.m.
Room E145-146

Authors:

James Dinan (Ohio State University)
Sriram Krishnamoorthy (Pacific Northwest National Laboratory)
D. Brian Larkins (Ohio State University)
Jarek Nieplocha (Pacific Northwest National Laboratory)
P. Sadayappan (Ohio State University)

Irregular and dynamic parallel applications pose significant challenges to achieving scalable performance on large-scale multicore clusters. These applications often require ongoing, dynamic load balancing in order to maintain efficiency. Scalable dynamic load balancing on large clusters is a challenging problem which can be addressed with distributed dynamic load balancing systems. Work stealing is a popular approach to distributed dynamic load balancing; however its performance on large-scale clusters is not well understood. Prior work on work stealing has largely focused on shared memory machines. In this work we investigate the design and scalability of work stealing on modern distributed memory systems. We demonstrate high efficiency and low overhead when scaling to 8,192 processors for three benchmark codes: a producer-consumer benchmark, the unbalanced tree search benchmark, and a multiresolution analysis kernel.


Panels Session

Energy Efficient Data Centers for HPC, How Lean and Green do we need to be?

Thursday, 10:30 a.m. – 12:00 p.m.
Room PB256

Moderator:

Michael K. Patterson (Intel Corporation)

Panelists:

William Tschudi (Lawrence Berkeley National Laboratory)
Phil Reese (Stanford University)
David Seger (IDC Architects)
Steve Elbert (Pacific Northwest National Laboratory)

The performance gains of HPC machines on the Top500 list is actually exceeding a Moore's Law growth rate, providing more capability for less energy. The challenge is now the data center design; the power and cooling required to support these machines. The energy cost becomes an increasingly important factor. We will look at best practices and on-going developments in data center and server design that support improvements in TCO and energy use. The panel will also explore future growth and the very large data centers. Microsoft has a 48 MW data center, Google has one at 85 MW. Google also operates one with a PUE of 1.15. How efficient can an exascale system get? Or must be? Is the answer tightly-packed containers or spread-out warehouses? Will we need to sub-cool for performance or run warmer to use less energy? Join us for a lively discussion of these critical issues.


Posters Session

Performance Analysis and Optimization of Parallel I/O in a Large Scale Groundwater Application on the Cray XT5

Tuesday, 5:15 p.m. – 7:00 p.m.
Oregon Ballroom Lobby

Authors:

Vamsi Sripathi (North Carolina State University)
Glenn E. Hammond (Pacific Northwest National Laboratory)
G. (Kumar) Mahinthakumar (North Carolina State University)
Richard T. Mills (Oak Ridge National Laboratory)
Patrick H. Worley (Oak Ridge National Laboratory)
Peter C. Lichtner (Los Alamos National Laboratory)

We describe in this poster the performance analysis and optimization of I/O within a massively parallel groundwater application, PFLOTRAN, on the Cray XT5 at ORNL. A strong scaling study with a 270 million cell test problem from 2,048 to 65,536 cores indicated that a high volume of independent I/O disk access requests and file access operations would severely limit the I/O performance scalability. To avoid the performance penalty at higher processor counts, we implemented a two-phase I/O approach at the application level by splitting the MPI global communicator into multiple sub-communicators. The root process in each sub-communicator is responsible for performing the I/O operations for the entire group and then distributing the data to rest of the group. With this approach we were able to achieve 25X improvement in read I/O and 3X improvement in write I/O resulting in an overall application performance improvement of over 5X at 65,536 cores.

A Parallel Power Flow Solver based on the Gauss-Seidel method on the IBM Cell/B.E.

Tuesday, 5:15 p.m. – 7:00 p.m.
Oregon Ballroom Lobby

Authors:

Jong-Ho Byun (University of North Carolina at Charlotte)
Kushal Datta (University of North Carolina at Charlotte)
Arun Ravindran (University of North Carolina at Charlotte)
Arindam Mukherjee (University of North Carolina at Charlotte)
Bharat Joshi (University of North Carolina at Charlotte)
David Chassin (Pacific Northwest National Laboratory)

In this paper, we report a parallel implementation of Power Flow Solver using Gauss-Seidel (GS) method on heterogeneous multi-core IBM Cell Broadband Engine (Cell/B.E.). GS-based power flow solver is part of the transmission module of the GridLAB-D power distribution simulator and analysis tool. Our implementation is based on PPE-centric Parallel Stages programming model, where large dataset is partitioned and simultaneously processed in the SPE computing stages. The core of our implementation is a vectorized unified-bus-computation module which employs three techniques — (1) computation reordering, (2) eliminating mis-prediction, (3) integration of four different bus computations using SIMDized vector intrinsic of the SPE and (4) I/O double-buffering which overlaps computation and DMA data transfers. As a result, we achieve 15 times speedup compared to sequential implementation of the power flow solver algorithm. In addition, we analyze scalability and the effect of SIMDized vectorization and double-buffering on application performance.


Birds-of-a-Feather Session

Extending Global Arrays to Future Architectures

Thursday, 12:15 p.m. – 1:15 p.m.
Room B118

Primary Session Leader:

Bruce Palmer (Pacific Northwest National Laboratory)

Secondary Session Leaders:

Manojkumar Krishnan (Pacific Northwest National Laboratory)
Sriram Krishnamoorthy (Pacific Northwest National Laboratory)

The purpose of this BOF is to obtain input from the Global Array user community on proposed development of the GA toolkit. This session is intended to cap a planning process that began in early summer and will provide GA users from the broader HPC community with an opportunity to discuss their needs with the GA development team. The main focus will be on extending the GA programming model to post-petascale architectures but other topics, including the addition of desirable features relevant for programming on existing platforms will also be entertained. The format will be informal discussion. The session leaders will provide a brief overview of issues associated with extending the GA programming model to the next generation of computers and current plans to deal with them and will then open up the discussion to session participants.

PNNL at SC09

Demonstration Slides

Videos

Past Conferences

PNNL at Supercomputing