IBM®
Skip to main content
    United States [change]    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Storage Systems - Projects - GPFS

IBM Almaden Research Center


Overview

General Parallel File System (GPFS) is a scalable, parallel,cluster file product that originated as Almaden's Tiger Shark file system. It now supports IBM® Blue Gene® and IBM® e(logo)server™ Cluster systems, including the Linux (Cluster 1350) and the AIX ( Cluster 1600) systems.Tiger Shark was originally developed for large-scale multimedia, but in its GPFS incarnation has been extended to support the additional requirements of parallel computing. GPFS supports single cluster file systems of multiple petabytes and has run at I/O rates of more than 100 gigabytes per second. It has recently evolved to be used in multi-cluster grid systems with high-bandwidth access to data from multiple storage clusters across wide geographic areas.

GPFS is the file system for the ASC Purple Supercomputer. ASC (the Advanced Simulation and Computing program) is a Department of Energy initiative to use computer simulation rather than nuclear testing to ensure the safety, reliability and performance of the nuclear stockpile. This requires computational, storage and I/O capabilities far beyond what existed before. ASC Purple is the current generation computing platform at Lawrence Livermore featuring 12,000 processors, a data store of 2 petabytes and I/O rates over 130 GB/sec to a single file or multiple files.

Recently, the scope of GPFS was extended to include the new Blue Gene machines. In this environment, a GPFS provides high-bandwidth I/O to the Blue Gene compute nodes using daemons that relay such requests to the designated I/O nodes. The I/O nodes form a GPFS cluster that communicates in parallel with another (typically Linux) cluster outside the Blue Gene machine. The external cluster actually has the physical connections to the disk volumes and operates as remote disk servers to the cluster within the Blue Gene. These systems can be thousands of I/O nodes and 10s of thousands of compute nodes.

In addition to high-speed parallel file access, GPFS provides fault tolerance, including automatic recovery from disk and node failures. Its robust design and multi-node access have made GPFS the chosen file system for a number of commercial applications such as large Web servers, data mining, digital libraries, file servers and online data bases.



arrow image IBM Almaden Research - File Systems
tiger shark
Technical Paper

Link to content in pdf format GPFS: A Shared-Disk File System for Large Computing Clusters, Frank Schmuck and Roger Haskin, First USENIX Conference on File and Storage Technologies (FAST'02), Monterey, CA, January 28-30, 2002. (Acrobat PDF, 297KB)

GPFS News

GPFS multi-cluster use in StorCloud challenge at SuperComputing 2004

Terabyte Sort record reduced to 7 minutes, 17 seconds

Almaden researcher uses GPFS to smash Terabyte Sorting Record

GPFS on ASC White

GPFS with database

GPFS with HPSS

HPSS collaboration at SDSC

TeraGrid

Grid Demo at Supercomputing 2003


    About IBMPrivacyContact