|
RAID technology has been used in storage systems as a means of ensuring reliability and
availability by protecting against disk drive failures. There are two primary sources of
data loss in disk drives: drive failures and uncorrectable media errors. Disk drive
capacities have been increasing at a rapid pace over the years -- between 60 percent and
100 percent per year. (Since 2004, the growth rate has slowed down to around 25-35 percent
per year). However, the probability of uncorrectable read errors has been relatively
constant: around 1 in 10E15 bits. As a result, with increasing capacities, the probability
of a data loss due to an uncorrectable media error has been increasing to the point where
it is now a significant factor.
The goal of the Advanced RAID project is to explore techniques that significantly reduce
the probabilities of data loss events. The primary considerations in developing RAID schemes
are storage efficiency, performance and reliability. Extensibility to higher levels of
reliability and the ability to run on existing hardware are among the other factors. We
developed a quantitative methodology to measure and compare the expected performance
characteristics of different erasure codes, under a variety of use cases and failure states
(e.g., host read while one or more disks have failed). We applied the general methodology
to a controller-based RAID system with a typical XOR hardware engine. The results identified
the strengths and weaknesses of certain known erasure codes when deployed in this environment.
An application of this general methodology to distributed storage systems is mentioned below.
Over 2004 and 2005, the Advanced RAID team has been exploring techniques and technologies
to extend reliability beyond that offered by RAID 6. The group has completed a detailed
analysis of SPIDRE technology. (Sector Protection through Intra-Drive REdundancy). This
technology can be combined with a base RAID scheme such as RAID 5 or RAID 6 and reduces the
exposure to data loss due to uncorrectable sector errors by more than a factor of 100. This
improved reliability comes at a very small increase in performance overheads (about 10 percent)
and at high storage efficiency (typically around 90-95 percent). The basic premise of SPIDRE
is that while RAID 6 reduces the exposure to data loss due to multiple drive failures, it may
not meet enterprise reliability requirements regarding uncorrectable sector errors, especially
with large-capacity and/or lower-reliability drives such as ATA/SATA. SPIDRE coupled with
RAID 6 provides an effective method to improve reliability while not giving up significant
performance or storage efficiency.
We conducted an extensive exploration into high-fault tolerant erasure codes. The WEAVER
codes (invented in our group) are new families of erasure codes that are well-suited to storage
systems, particularly for distributed storage systems, such as Kybos/Intelligent Bricks. There
are constructions that tolerate as many as ten failed devices but have storage overhead equal to
that of simple mirroring. The search for WEAVER codes required the use of the Blue Gene/L
supercomputer. Some of the searches required billions of matrix reductions.
In other activity, a detailed analysis of reliability and performance considerations was
carried out for dRAID -- distributed Redundant Arrangements of Intelligent Devices. This
technology is relevant to distributed storage systems, such as Kybos/Intelligent Bricks, where
the controller may also be a point of failure requiring redundancy to be distributed across
controllers and disk drives. In the reliability analysis work, the mean time to data loss for
an arbitrary configuration of dRAID fault tolerance with and without internal RAID (redundancy
within each brick) was determined as functions of the reliability of the nodes and disks within
the system. The performance analysis provided the means to determine effective algorithms for
the distribution of the I/O, XOR computations and message passing between the nodes of a distributed
system. A reliability analysis to determine the mean time to loss of spare capacity of a dRAID
system with a given initial spare capacity, and consequently, the sparing needed to achieve a
required mean time to loss of spare capacity was also completed.
IBM Almaden Research - Advanced Storage Systems
|
REO: A Generic RAID Engine and Optimizer,
James Lee Hafner, Fifth USENIX Conference on File and Storage Technologies (FAST '07), February 13-16, 2007
Notes on Reliability Models for Non-MDS Erasure Codes, James Lee Hafner and KK Rao, IBM Research Report RJ10391, 2006
HoVer Erasure Codes for Disk Arrays, James Lee Hafner, DSN-DCCS 2006 - International Conference on Dependable Systems and Networks, June 25-28 2006.
Reliability for Networked Storage Nodes, KK Rao, James L. Hafner and Richard A. Golding,
DSN-DCCS 2006 - International Conference on Dependable Systems and Networks, June 25-28, 2006.
WEAVER Codes: Highly Fault Tolerant Erasure Codes for Storage Systems,
James Lee Hafner, Fourth USENIX Conference on File and Storage Technologies (FAST '05), December 13-16, 2005
Matrix Methods for Lost Data Reconstruction in Erasure Codes,
James Lee Hafner, Veera Deenadhayalan, KK Rao and John Tomlin, Fourth USENIX Conference on File and
Storage Technologies (FAST '05), December 13-16, 2005
Performance Metrics for Erasure Codes in Storage Systems, James Lee Hafner, Veera Deenadhayalan,
Tapas Kanungo and KK Rao, IBM Research Report RJ10321, 2004
R5X0: An Efficient High Distance Parity-Based Code with Optimal Update Complexity, Jeff R. Hartline,
Tapas Kanungo and James Lee Hafner, IBM Research Report RJ10322, 2004
|