IBM   Almaden Computer Science About Almaden Computer Science Press Careers Home
CS home
  About us
  Calendar
  Careers
  Patents
  Press
Projects
  Cyberspace
 Database
  Ease of use
  Foundations
  Multimedia
  Storage
  Compass
  Delta Compression
  DRACULA
  GPFS
  LH* Schemes
  RAID controllers
  Salsa
  Serverless Backup
  TSM
   
search the web for java
patent server
 
GPFS
Parallel File System
tiger shark







Almaden researchers broke the world record in sorting!






One terabyte of 100-byte records in 17 minutes, 37 seconds!

Almaden researchers recently broke the world record for sorting a terabyte of data on an RS/6000 SP. The researchers sorted one terabyte of 100-byte records in 17 minutes, 37 seconds. This is almost a factor of 3 faster than the previous record holder, a cluster of Compaq NT servers at Sandia Labs. The speed of sorting has been used as a measure of computer systems I/O and communication performance for a number of years. The Almaden sort program, SPsort, sustained nearly 2.8GB/s of I/O to and from the GPFS global file system, 5.6GB/s of interprocessor communication across the SP switch, and about 1.9GB/s to scratch files on local disks during its execution.

Background

In 1985, an article in Datamation Magazine (A Measure of Transaction Processing Power, by Anon. et al.) proposed a sort of one million records of 100 bytes each with random 10 bytes keys as a useful benchmark of computer systems I/O performance. The benchmark groundrules are that all input must start on disk, all output must end on disk, and that the overhead to start the program and create the output files must be included in the benchmark time. Since the current record for this benchmark is around a second, new benchmarks were established to stress ever larger computing systems. "MinuteSort" measures how much can be sorted in one minute, and "PennySort" measures how much can be sorted for one cent. At the high end is Terabyte Sort. A number of Terabyte Sort records have been reported recently. Almaden's SPsort improves substantially upon the best of these.

Hardware and software

SPsort was run on an RS/6000 SP with 488 nodes. Each node contains 4 332MHz 604 processors, 1.5GB of RAM, and a 9GB SCSI disk. The nodes communicate with one another through the high-speed SP switch with a bi-directional link bandwidth to each node of 150 megabytes/sec. Global storage of 6 TB of disk storage in the form of 336 RAID arrays is attached to 56 of the nodes. Besides the 56 disk servers, 400 of the SP nodes actually ran the sort program.

Sort input and output data was stored in the GPFS parallel file system. All 336 RAID devices were configured as a single mountable file system. GPFS stripes files across all these devices, allowing the machine's aggregate bandwidth of over 2.5GB/s to be brought to bear on a single file when necessary. The sort benchmark program averaged 1.89GB/s through GPFS during its execution, although the peak rates were significantly higher. Corresponding rates of MPI communication through the switch and access to local disks were sustained during the sort.

SPsort is a custom sort program optimized for Terabyte Sort. It uses standard SP and AIX services: XOpen-compliant file system access through GPFS, MPI message passing between nodes, Posix pthreads, and the SP Parallel Environment to initiate and control a sort job running on many nodes.

Further Information

A detailed report (53.9KB Acrobat PDF file) on SPsort and the Terabyte Sort result is available in PDF format.

Further information on the Terabyte Sort benchmark and other results is available at http://research.microsoft.com/barc/SortBenchmark/

top

GPFS Home | Almaden Home | IBM Research | Legal | Feedback