Page 1 of 2
Storing data is fundamental to programming. We often think of the task as something that just involves hardware, but we can take basic storage devices and use them in conjunction with clever algorithms to make the whole thing work better. Storage can be about software.
The idea of RAID
RAID “Redundant Arrays of Inexpensive Disks” is an idea whose time has more than come. When it was invented back in 1987 the “Inexpensive” part of the name was something of a joke. For a desktop machine, the cost of a single hard disk was just affordable, the idea of using a whole set of them was out of the question. Today, of course, you can get a high-capacity, high-performance drive for less than $50 and putting multiple units together to make something much better than a single drive is a very attractive proposition.
So what is RAID?
How does it work and how can you get to make use of it?
The RAID idea is very simple - take multiple drive units and connect them together to make them work as a single “virtual” storage device.
You treat a RAID system as if it was a single disk drive or storage volume, even though it might consist of multiple physical drives. This sounds obvious enough but what does it get you in addition to more cost, more heat and more noise?
RAID enthusiasts often forget that the purpose of the whole idea isn’t obvious and hence don’t bother to explain.
There are three main reasons for wanting to implement RAID.
The first is simply reliability. A hard disk is an electromechanical device. It has moving parts and as a result, even if it was manufactured perfectly, it will eventually wear out. Add to this the fact that failures occur for entirely unpredictable reasons and you can appreciate that hard disks need to be made more reliable. If you reduce the dependence of the data store on the correct functioning of any one single drive then you have improved the reliability of the overall storage. This is the meaning of the “Redundant” in RAID and it implies that you are actually using more real physical drives than you need to store the data.
- The second reason is, that with a little careful design, a RAID system can be made to appear faster than any single drive that makes it up. If you have two or more drives, each capable of transferring data at a fixed rate, the total transfer rate is likely to be higher. In most cases this is a secondary consideration and in many RAID systems the performance can actually be lower than that of a single drive.
The third reason is the most obvious. An array of drives made to look like a single virtual drive or volume can provide not only more storage than a single drive but expandable storage. Of course you can increase the storage available simply by adding drives without the need to tie them together into a single virtual drive, but users prefer to see a single storage volume that contains all their data. Notice, however that this reason can conflict with the first reason in that some storage capacity is generally sacrificed to provide increased reliability.
It's important to remember that whatever RAID scheme you choose, it is not a substitute for backup. A RAID system may be more reliable but it can still fail. An earthquake could take out the building and your data. Much smaller disasters, such as a cascading power supply failure, can easily destroy all of the disks in an array - so backup is not optional, even with a RAID system. Knowing how to keep data safe is an essential skill for big companies that rely on their computers, taking an information assurance training will help you acquire these skills from the comfort of your home
Six basic types of RAID
The reasons for building RAID systems seem desirable enough so how do we achieve them?
There are a number of types of RAID differing in how much they satisfy each of the objectives.
RAID 0 - Striping
RAID 0 uses a technique called “striping” which is employed in other RAID versions.
The basic idea is that if you want to write a file to the disks it is split up into fixed size blocks. The first block is written to the first drive in the array, the second to the second and so on until we get back to the first drive again.
You can see that RAID 0 makes the array of drives look like a single virtual drive by spreading the data over all of them. As long as the drives can function independently of one another there can be a performance gain but this also depends on choosing the correct block or ‘stripe’ size to optimise the performance.
In practice RAID 0 should be avoided at all costs for the simple reason that the failure of any drive in the array results in the complete loss of data and data recovery is made much more difficult by the way that the data is spread across the drives.
RAID 1 - Mirroring
This is often referred to as mirroring.
Additional drives simply backup the original, i.e. they mirror the first drive. In most cases mirroring is used with two drives as this provides the maximum benefit for smallest cost.
It doesn’t increase the storage capacity, i.e. two drives provide the same capacity as one, but it does improve data reliability. If one of the drives fails there is still a copy of all of the data on the working drive. You can replace the failed drive with a fresh drive and the mirror copy will be regenerated.
Mirroring can also increase performance if it is arranged so the data is read from alternate drives, but this isn’t really the main reason for using Mirroring.
RAID 2 sounds a bit like RAID 0 in that the data in a file is split and stored on all the drives in the array, but when you look at it in detail it is really a very different conception of how things should work.
In this case the data is striped into single bit sized blocks and an error correction code is added. The idea is the data is divided up into data words, error correction bits are added, and these are stored one bit per drive. Typically RAID 2 needs lots of drives to make it work.
For example, if the data in the file is naturally divided up into bytes and you add two error correction bits then you need 8 plus 2 i.e. 10 drives to store the data. As you can imagine this creates a highly reliable storage system which can withstand the simultaneous failure of multiple drives (exactly how many depends on the number of error correction bits added). It can also improve performance if suitable hardware is used to run the drives in parallel. In fact you need special hardware to make this scheme work at all and typically the drives need to spin in sync with one another so that the data can be written in parallel.
In many senses RAID 2 is the technological pinnacle of RAID systems, but it is hardly ever used today despite the falling cost of drives. The reason is most probably that it is overkill in the sense that drives are mostly reliable in the sense that they either work or they don’t work. A complete drive failure is well handled by simpler RAID systems - mirroring for example - and the on-the-fly error correction provided by RAID 2 generally isn’t needed.