- GPFS
- Parallel
File System
Almaden
researchers broke the world record in sorting!
One
terabyte of 100-byte records in 17 minutes, 37 seconds!
|
lmaden
researchers recently broke the world record for sorting a terabyte
of data on an RS/6000 SP. The researchers sorted one terabyte of 100-byte
records in 17 minutes, 37 seconds. This is almost a factor of 3 faster
than the previous record holder, a cluster of Compaq NT servers at
Sandia Labs. The speed of sorting has been used as a measure of computer
systems I/O and communication performance for a number of years. The
Almaden sort program, SPsort, sustained nearly 2.8GB/s of I/O to and
from the GPFS global file system, 5.6GB/s of
interprocessor communication across the SP switch, and about 1.9GB/s
to scratch files on local disks during its execution.
Background
In 1985, an
article in Datamation Magazine (A Measure of Transaction
Processing Power, by Anon. et al.) proposed a sort of one million
records of 100 bytes each with random 10 bytes keys as a useful
benchmark of computer systems I/O performance. The benchmark groundrules
are that all input must start on disk, all output must end on disk,
and that the overhead to start the program and create the output
files must be included in the benchmark time. Since the current
record for this benchmark is around a second, new benchmarks were
established to stress ever larger computing systems. "MinuteSort"
measures how much can be sorted in one minute, and "PennySort"
measures how much can be sorted for one cent. At the high end is
Terabyte Sort. A number of Terabyte Sort records have been reported
recently. Almaden's SPsort improves substantially upon the best
of these.
Hardware and software
SPsort was run
on an RS/6000 SP with 488 nodes. Each node contains 4 332MHz 604
processors, 1.5GB of RAM, and a 9GB SCSI disk. The nodes communicate
with one another through the high-speed SP switch with a bi-directional
link bandwidth to each node of 150 megabytes/sec. Global storage
of 6 TB of disk storage in the form of 336 RAID arrays is attached
to 56 of the nodes. Besides the 56 disk servers, 400 of the SP nodes
actually ran the sort program.
Sort input and
output data was stored in the GPFS parallel file system. All 336
RAID devices were configured as a single mountable file system.
GPFS stripes files across all these devices, allowing the machine's
aggregate bandwidth of over 2.5GB/s to be brought to bear on a single
file when necessary. The sort benchmark program averaged 1.89GB/s
through GPFS during its execution, although the peak rates were
significantly higher. Corresponding rates of MPI communication through
the switch and access to local disks were sustained during the sort.
SPsort is a
custom sort program optimized for Terabyte Sort. It uses standard
SP and AIX services: XOpen-compliant file system access through
GPFS, MPI message passing between nodes, Posix pthreads, and the
SP Parallel Environment to initiate and control a sort job running
on many nodes.
Further Information
A detailed report (53.9KB
Acrobat PDF file) on SPsort and the Terabyte Sort result
is available in PDF format.
Further information
on the Terabyte Sort benchmark and other results is available at http://research.microsoft.com/barc/SortBenchmark/
top
GPFS Home | Almaden Home | IBM
Research | Legal
| Feedback
|