Skip to main content

pNFS

Project Description

pNFS, an integral part of NFSv4.1, promises to bridge the gap between the performance requirements of large, parallel applications and the interoperability and security requirements of modern Grid workflows. pNFS provides high-performance data access to large-scale storage systems in both LAN and WAN environments. In addition, pNFS decouples the tight bond between storage systems and their clients, enabling pNFS clients to directly access parallel file systems. Direct access reduces latency, allows full use of the available network bandwidth, and reduces the management overhead and storage space required to maintain copies of large data sets in multiple data centers.

pNFS splits the NFSv4.1 protocol into a control path and a data path. pNFS clients and state servers use the NFSv4.1 protocol to communicate control and file management operations, while all I/O along the data path is delegated to a storage-specific layout driver. A management protocol binds metadata servers with storage devices.

pNFS Architecture

pNFS with GPFS

Our goal is to leverage pNFS and GPFS to build the fastest and most scalable NAS system in the world. The nodes in the GPFS cluster chosen for pNFS access are divided into (possibly overlapping) groups of state and data servers. pNFS clients are distributed across all state servers in a round-robin fashion. Clients send metadata requests to their associated state server while I/O is distributed across all of the data servers. Each state server functions as fully functional NFSv4.1 metadata server, with GPFS maintaining correctness using an internal management protocol.

To perform direct and parallel I/O, a pNFS client first requests layout information from a state server. A layout contains the information required to access any byte range of a file. The pNFS client uses the layout information to translate data access requests into READ and WRITE operations to the correct data servers. For writes, once the I/O is complete, the client sends an NFSv4 COMMIT operation to its state server. This single COMMIT operation has GPFS acquire exclusive access to the file, flushing data to stable storage on every data server. The GPFS management protocol maintains the freshness of NFSv4 state information among servers. Since each GPFS server exports the entire file system, the layout does not indicate the actual location of the data. Instead, the layout provides a mechanism to balance client load among the data servers. This allows GPFS a great deal of flexibility in how it generates the layout information. For example, GPFS can rebalance data across the disks without needing to recall and generate new layout information for pNFS clients. In addition, layouts can be used to ensure all I/O for a byte-range of a file are sent to a single GPFS server, reducing lock contention and the number of read-modify-write sequences.

Key Benefit

pNFS GPFS Architecture

Selected Publications

  • D. Hildebrand, A. Nisar, R. Haskin, "pNFS, POSIX, and MPI-IO: A Tale of Three Semantics," in Proceedings of the 4th Petascale Data Storage Workshop, Portland, OR, 2009.Paper | Slides
  • D. Hildebrand, P. Andrews, M. Eshel, R. Haskin, P. Kovatch, and J. White, "Deploying pNFS across the WAN: First Steps in HPC Grid Computing," in Proceedings of the 9th LCI International Conference on High-Performance Clustered Computing, Urbana, IL, 2008. Paper | Slides

People