IBM®
Skip to main content
    United States [change]    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Storage Systems - Projects - GPFS - Tera Sort Record

IBM Almaden Research Center


Terabyte Sorting Record - GPFS

In 2005, Almaden researchers broke the world record for sorting a terabyte of data on an RS/6000 SP. The researchers sorted one terabyte of 100-byte records in 17 minutes, 37 seconds. This is almost a factor of 3 faster than the previous record holder, a cluster of Compaq NT servers at Sandia Labs. The speed of sorting has been used as a measure of computer systems I/O and communication performance for a number of years. The Almaden sort program, SPsort, sustained nearly 2.8GB/s of I/O to and from the GPFS global file system, 5.6GB/s of interprocessor communication across the SP switch, and about 1.9GB/s to scratch files on local disks during its execution.

Background

In 1985, an article in Datamation Magazine (A Measure of Transaction Processing Power, by Anon. et al.) proposed a sort of one million records of 100 bytes, each with random 10-byte keys, as a useful benchmark of computer systems I/O performance. The benchmark ground rules are that all input must start on disk, all output must end on disk, and that the overhead to start the program and create the output files must be included in the benchmark time. Since the current record for this benchmark is around a second, new benchmarks were established to stress ever larger computing systems. "MinuteSort" measures how much data can be sorted in one minute, and "PennySort" measures how much data can be sorted for one cent. At the high end is Terabyte Sort. A number of Terabyte Sort records have been reported recently. Almaden's SPsort improves substantially upon the best of these.

Hardware and software

SPsort was run on an RS/6000 SP with 488 nodes. Each node contains four 332MHz 604 processors, 1.5GB of RAM, and a 9GB SCSI disk. The nodes communicate with one another through the high-speed SP switch with a bi-directional link bandwidth to each node of 150 MB/sec. Global storage of 6 TB of disk storage in the form of 336 RAID arrays is attached to 56 of the nodes. Besides the 56 disk servers, 400 of the SP nodes actually ran the sort program.

Sort input and output data was stored in the GPFS parallel file system. All 336 RAID devices were configured as a single mountable file system. GPFS stripes files across all these devices, allowing the machine's aggregate bandwidth of over 2.5GB/s to be brought to bear on a single file when necessary. The sort benchmark program averaged 1.89GB/s through GPFS during its execution, although the peak rates were significantly higher. Corresponding rates of MPI communication through the switch and access to local disks were sustained during the sort.

SPsort is a custom sort program optimized for Terabyte Sort. It uses standard SP and AIX services: XOpen-compliant file system access through GPFS, MPI message passing between nodes, Posix pthreads, and the SP Parallel Environment to initiate and control a sort job running on many nodes.



arrow image IBM Almaden Research - File Systems
More Information

Link to content in pdf format Sorting on a Cluster Attached to a Storage-Area Network, Jim Wyllie.

Link to content in pdf format SPsort: How to Sort a Terabyte Quickly, Jim Wyllie, A Detailed Report on SPsort and the Terabyte Sort result. (Acrobat PDF, 53.9KB)

Further information on the Terabyte Sort benchmark and other results are available at: Jim Gray's Sort Benchmark Home Page/


    About IBMPrivacyContact