(818) 701-7771

Globalstor Unleashes Unparalleled RAID 5 Over RAID 5 Data Redundancy

RAID 5/5 Expands Data Security Without Compromising Performance

LAS VEGAS, NV April 8, 2002:

A concept developed in the 1980′s by the UC Berkeley computer science department, RAID (Redundant Array of Inexpensive or Independent Disks) has become a vital part of today’s critical enterprise and workgroup networks, providing crash-proof data accessibility for reliable 24 x 7 data access. The basic idea of RAID is to combine multiple inexpensive disk drives into an array of disk drives to obtain performance, capacity and reliability that exceeds that of a single large drive. The array itself appears to the host computer as a single logical drive.

Advancements in technology have made hard disk drives faster, smaller, higher in capacity and less expensive serving to further accelerate the popularity of RAID.  These solutions protect against a disk drive failure through the incorporation of a variety of RAID levels, which implement parity recovery mechanisms and result in greater data accessibility.  By preventing downtime due to a hardware failure, environments can sustain multiple bad sectors or even whole disk failures without impeding data access to the end-user. 

Until recently, beyond RAID 0 (a non-redundant array of disk drives), there have been five types of array architectures.  RAID 1 through RAID 5 each provide disk fault-tolerance with varying degrees of feature and performance compromises.  Today there is a new hybrid: RAID 5 over 5 (RAID 5/5).  Developed by Globalstor Data Corporation, a leading provider of a broad range of high-end mass storage solutions, RAID 5/5 expands the data security compatibility of RAID 5 solutions by adding additional redundancy of parity over the multiple RAID 5 partitions.  With RAID 5/5, environments could essentially loose an entire RAID array without losing access to data.

RAID Review

The basic idea behind RAID is to combine multiple, inexpensive disk drives into an array that results in a performance, capacity and reliability level exceeding the capabilities of any single large drive.  As a distinct advantage to the end-user, the entire array of drives, regardless of the quantity, appear to the host computer as one single logical drive for faster, more convenient data access.

Fundamental to RAID technology is striping, or the combining of multiple drives into one logical storage unit. Each stripe can range from a single sector of 512 bytes up to several megabytes. The stripes are interleaved in a rotating sequence, combining alternating stripes from each available drive in the array.  Ultimately, it is the operating environment that determines whether large or small stripes should be used, but it is this striping that maximizes throughput for the disk subsystem by balancing the I/O load across the array. 

In a multiple drive system without striping, the load is unbalanced with only a few drives storing the frequently accessed data causing undue wear, network bottlenecks and recurrent drive failure.  By striping the drives in the array so that each record falls entirely within one stripe, most records can be evenly distributed across all the drives in the array.  As a result, all the drives work concurrently on various I/O operations, maximizing the number of simultaneous operations.

RAID Levels

RAID 0 — Utilizes striping without parity or data redundancy (illus. 1).  These arrays require a minimum of two drives and can be configured with large stripes for multi-user environments or small stripes for single-user systems accessing long sequential records.  When a large data set is read, the controller issues multiple read commands to eliminate latency. RAID 0 arrays deliver the best data storage efficiency and performance, however if one drive in a RAID 0 array fails, the entire array fails and all data is lost.

This level of RAID may be suitable for large, transient data sets, which are both analyzed and discarded, or archived; for example non-linear video editing or faster images processing.

RAID 1 — Also known as disk mirroring, RAID 1 (illus. 2) duplicates data from one drive to a second, while enabling the pair of disk drives to appear to the computer as a single drive.  Striping is not used within a single mirrored drive pair, but environments incorporating multiple RAID 1 arrays can stripe the drives together creating the appearance of a single large array consisting of pairs of mirrored drives. All writes must go to both drives of a mirrored pair so that

the information on the drives is kept identical, but each individual drive can perform simultaneous independent read operations.  Ultimately, mirroring doubles the read performance of a single non-mirrored drive without affecting the write performance.

RAID 1 is the best choice for performance-critical, fault-tolerant environments and is the only choice for fault-tolerance where no more than two drives are used.

RAID 2 — Sector-stripe data across groups of drives, with some drives assigned to store ECC (Error Correction [or Correcting] Code) information. ECC allows data that is being read or transmitted to be checked for errors and, when necessary, corrected on the fly, but the code itself requires additional storage capacity.  However, because today’s disk drives now embed ECC information within each sector, RAID 2 (illus. 3) is seldom used.

RAID 3 — This RAID level sector-stripes each block of data across groups of drives (like RAID 2), but dedicates one entire drive to storing just parity information.  Records typically span all available drives in the array resulting in optimal transfer rates, however, because each I/O request must access every drive in the array, RAID 3 (illus. 4) solutions can only satisfy one I/O request at a time. RAID 3 is best suited for single-user; single-tasking environments with long records.  Because RAID 3 does not allow overlapping of multiple I/O operations and requires synchronized-spindle drives to avoid performance degradation with short records. This RAID level may be most suitable for applications in graphics and imaging.

RAID 4 — This RAID level (illus. 5), like RAID 3, also dedicates a drive to storing parity information but large stripes are used enabling records to be accessed from any individual drive in the array allowing multiple simultaneous read operations to be overlapped.  Unfortunately, since all write operations must update the parity drive for redundancy, the same is not true for write functions.

RAID 4 is best suited for education or other environments that require multiple read access only.

RAID 5 — similar to RAID-0, RAID 5 (illus. 6) is sometimes called a Rotating Parity Array, RAID level 5 fully incorporates both data striping & parity, allowing disks to satisfy multiple read and write requests independently and resulting in a higher read performance in a request intensive environment. Since parity information is used across the entire array, a RAID 5 stripe can withstand a single disk failure without losing data or access to data. In addition, because the parity information is written over the entire array, and not just a single disk, RAID 5 also greatly reduces bottlenecks often experience with a single-parity-disk solution.

RAID 5 combines efficient, fault-tolerant data storage with good performance characteristics, but write performance and performance during drive failure can be slower than with RAID 1.  Rebuild operations also require more time than with RAID 1 because parity information must also be reconstructed. At least three drives are required for RAID 5 arrays.

When a RAID 5 drive fails, the overall system I/O is degraded due to the additional reads and writes required to maintain the integrity of the distributed parity data. RAID 5 is appropriate for systems producing an I/O stream dominated by read requests and can also be used in database or server environments when the pseudo random nature of the I/O stream effectively negates the advantages of a large system level read cache.  A controller level cache can significantly improve RAID 5 performance by accelerating parity read and write updates.

RAID 5 is suitable for applications in database query and transaction processing as well as most imaging applications.

RAID 6 — An extension of RAID 5, RAID 6 (illus. 7) provides additional fault tolerance through the incorporation of a second independent distributed parity scheme.  With a two-dimensional parity, not only is data striped on a block level across a set of drives as with RAID 5, but a second set of parity is also calculated and written across all available drives within the array.  With RAID 6, environments have extremely high data fault tolerance and are capable of sustaining multiple, simultaneous drive failures.  Unfortunately, RAID 6 incorporates a very complex controller design, the overhead to compute parity addresses is extremely high and write performance is generally very poor. Because of the two-dimensional parity scheme, RAID 6 requires N+2 drives to implement.

Overall, RAID 6 is a good solution for mission critical applications.

RAID 7 — A registered trademark of Storage Computer Corporation, RAID 7 (illus. 8) provides 25% to 90% better overall write performance than single spindle solutions.  Host interfaces are scalable for connectivity or increased host transfer bandwidth and small reads in multi user environments typically experience very high cache hit rates resulting in near zero access times.  Write performance can be improved by increasing the number of drives in the array.  Read performance can also be improved with each increase in the number of actuators in the array.  No additional data transfer is required for parity manipulation on RAID 7.

RAID 7, however, is a single vendor proprietary solution with an extremely high cost per MB and a very short warranty.  The array is not user serviceable, and the power supply must have a dedicated UPS to prevent cache data loss.

RAID 10 — Implemented as a striped array with RAID 1segments, RAID 10 (illus. 9) offers users the same fault tolerance as RAID 1 paired with the same overhead for fault-tolerance as mirroring alone.  The striped RAID 1 segments ensure high I/O rates, and the array can, under certain circumstances, sustain multiple simultaneous drive failures.  Ultimately, RAID 10 is an excellent solution for sites considering RAID 1, but requiring additional performance. 

The disadvantage to RAID 10 is that it is a very expensive solution with limited scaleability and a high overhead.  In addition, all the drives in the array must move in parallel, resulting in a significantly lower sustained performance. 

RAID 10 is best suited for applications such as database server where high performance and fault tolerance are required.

RAID 53 — A modified RAID 3, RAID 53 (illus. 10) incorporates a striped (RAID 0) array with RAID 3 segments.  Offering the same fault tolerance and fault overhead as RAID 3, RAID 53 combines the high data transfer rates of RAID 3 and the high I/O rates for small requests of RAID 0.  RAID 53 is a good solution for sites considering RAID 3 implementation, but requiring additional performance.  However, RAID 53 is very expensive, the disk spindles must all be synchronized (limiting drive choice) and the byte striping ultimately results in poor utilization of the drive’s formatted capacity.

RAID 0+1 — A mirrored array with RAID 0 segments, RAID 0+1 (illus. 11) offers the same fault tolerance as RAID 5, with the same overhead for fault-tolerance as mirroring alone.  Although capable of achieving high I/O rates through the use of multiple stripe segments, a single drive failure will cause the entire array to become essentially a RAID 0 array making RAID 0+1 is an excellent solution for sites requiring high performance, but not maximum reliability.  An expensive solution with limited scaleability and a high overhead, all drives in the RAID 0+1 array must move in parallel, resulting in a significantly lower sustained performance. 

RAID 0+1 is best suited for general fileserver and imaging applications.

RAID 5/5 — The newest level of RAID, 5/5 (illus. 7) incorporates the best features RAID has to offer, providing expanded data security without compromising performance.  With RAID 5/5, each individual RAID 5 volume (of up to12 drives or units) is given its own SCSI ID resulting in all of the drives under that volume appearing to the end user as one large drive.  This configuration allows up to 1.44TB data storage per 4-U rack.  Each 4-U rack is then daisy-chained under an additional external RAID controller or a PCI based RAID controller, making each RAID 5 volume appear not as multiple separate 1.44TB drives, but as one drive with virtually limitless capacity. 

Ultimately, the RAID 5/5 environment will have multiple 12 drive arrays that appear to the host as one, and span data across each of the separate volumes enabling data to be accessed transparently from the RAID 5/5 Volume.  If a hard drive were lost from an individual group volume, the individual volume would begin automatically rebuilding data.  In the event an entire rack is lost, the remaining rack/s can continue to supply enterprise demands without any data loss.

RAID 5/5 is ideally suited for enterprise environments including government, military, medical and film houses for applications such as content preservation and distribution, medical image generation, and video archival or in applications that require virtually unlimited scalability with ‘bullet proof’ redundancy for the highest level of data integrity. 

RAID Implementation

Once an organization had decided to implement a RAID solution into their environment, it is necessary to chose between either a hardware or software integration. Software RAID solutions use the host computer’s CPU and memory to implement the various RAID functions.  Using software and the host’s internal processor, environment can utilize the large cache memory of the CPU for RAID (typically 0 and 1) operations.  Higher RAID levels are usually implemented through the addition of hardware, either an internal RAID board or an external RAID processor or controller.

Internal RAID, like software RAID, is operating system dependent and usually requires a driver specifically designed to access and configure the RAID controller.  Another problem with the internal design is that if/when the controller fails, the entire host computer must be shut down and all work relative to that array is stopped while the board is repaired or replaced. Another deficit of internal controllers is that expansion cards are not offered, meaning the size or capacity of the RAID is unscaleable beyond the set limit. In addition, internal controllers reduce system redundancy capability being single host dependent and unable to communicate to multiple hosts.  Failover on an internal controller is also difficult (at best) to configure.

Hardware RAID controllers, on the other hand, offer several advantages including the use of a dedicated CPU, which is needed to calculate parity and map the location of the files stored throughout the array. With an external RAID controller, the IT manager also has the option of turning off only the other devices shared on that one bus, allowing work to continue on the rest of the array while repairs are being made.  In addition, external hardware controllers offer greater flexibility relative to the number of drives each can address.  Internal RAID controllers, on the other hand, incorporate SCSI adapter functions resulting in even faster communications between the controller and CPU compared to the external.  With the integrated SCSI adapter functions, the data is able to avoid a communication layer.  An internal RAID controller is also generally less expensive than a comparably equipped external controller. 

The Importance of RAID

RAID systems, by design, can sustain several bad sectors, even whole disk failures and continue running transparently to the end-user.  Spare disk drives are often supported to speed the automatic rebuilding of failed drives, and removable disk canisters simplify the task of replacing failed disk drives.  However, RAID systems not only increase reliability by preventing downtime due to hardware failure, they also increase available storage capacity, in some instances providing terabytes, or even petabytes of scaleability.  RAID 5/5 enhances those key RAID characteristics adding vital features such as doubled redundancy.  With RAID 5/5, environments could essentially loose an entire RAID array without losing access to data. 

Globalstor Data developed RAID 5/5 to improve enterprise efficiency while reducing overhead with dynamic, on-site scaleability and providing a virtually unbreakable safety net.  For greater flexibility, IT managers can configure the individual RAID volumes to any combination of RAID 0 through 5 allowing multiple departments within an organization to unify and share data throughout the enterprise without sacrificing one departments’ requirement for fast RAID 3 data access for anothers’ RAID 5 redundancy.