One term you may come across as you investigate digital asset management systems (or any other large computer system) is RAID. You wouldn’t be alone if you had no idea what that actually means.
The literal meaning of the acronym is Redundant Array of Inexpensive Disks. I bet that didn’t help much, did it? At its core, RAID is way of combining several hard disks into a form that the computer system reads as a single disk. Benefits of using RAID include:
- Being able to create a single large drive out of several smaller ones. When drive size was a significant gating factor in creating large storage spaces, this was a huge benefit.
- Providing redundancy. In some flavors of RAID, data is written to multiple disks so that if one drive fails, the data is recoverable.
- Improving read and write speed. By spreading reads and writes to the disks over several physical drives, information can be written and accessed more quickly by the computer system.
There are a lot of different flavors of RAID. Each one does something slightly different, and I’ll just talk about the main three. I’m going to use an analogy of writing in a notebook as an example to explain them.
The first is RAID0 (often called striping without parity). In RAID0, data is written across multiple disks in what are called stripes. In our notebook example, this would be rather like writing an essay, and writing one sentence in one notebook, the next in another and the next in another. Each sentence would be a “stripe” of your data. This is not actually redundant, as each piece of data is only written in one place, and in fact, a single drive failure can affect the data held on every drive. The benefit that it offers is primarily speed. One person writing in the fashion would not be faster, but if it were five people transcribing into their own notebook, it would be. The same is true for disks – if each drive is writing a piece of the data simultaneously, the whole gets written much more quickly. Since RAID0 is not redundant, it is not used for fault tolerance, but generally only for its performance boost. In RAID0, all of the drive space is available for storage.
The next is RAID1 (often called drive mirroring). In RAID1, any data written to any drive in the array is written identically to all the disks in the array. In our notebook example, this would be like writing multiple copies of your essay into two, three or even more notebooks. If you spill water on one and ruin it, you have a second copy that you can use in place of the first. As long as one notebook (or drive) is readable, you have a perfect copy of your essay. The one significant downside to RAID1 is its disk consumption – you need at least twice as many disks as will be available in space on those disks, since you are keeping two copies of everything.
The final is RAID5 (often called striping with parity). In RAID5, data is written across multiple drives like it is in RAID0, but in addition, writes a small piece of data (called parity) to another drive in the array. This is where my notebook analogy begins to be a stretch, since humans can’t replicate computer code and parity is calculated with a very small amount of data, but it would be like writing each sentence in a sequential notebook and then also recording a code, spread through all the other notebooks, that would allow you to recognize a missing sentence or fill in a missing word should something happen to one of the notebooks. RAID5 is redundant, although in most setups you can only lose one drive at a time from the group, and provides many of the same speed benefits of RAID0. In RAID5, you lose one drive’s amount of space as storage space, as that is used for the parity information.
I hope you’ve found this introduction into RAID to be useful! If you are looking for some additional, deeper information, I would suggest that you start by checking out these Wikipedia articles: http://en.wikipedia.org/wiki/RAID and http://en.wikipedia.org/wiki/RAID-10#RAID_10_.28RAID_1.2B0.29
Posted by Jennifer Cox
Flickr photo by delaere