Author: Simon Kaczor
The evolution of computing is characterized by a vertiginous acceleration of speed and capacity. As we install sophisticated applications and make use of computers in more creative ways, storage needs are pushed even further. You can improve your disk performance by using a RAID-enabled desktop system running common OSS applications.
A typical hard drive is 100 times slower than memory while reading sequential blocks of data. If the data is scattered across the disk, the performance becomes even worse. Add multi-tasking to the equation, and the typical computer will start showing little progress on the screen. Fortunately, Linux makes Redundant Arrays of Independent Disks (RAID) accessible on the desktop, even for cost-conscious users.
To see how RAID can improve performance, I created a test rig out of an average desktop machine with two identical Serial ATA hard drives. The results are shown in the table below.
System | |
---|---|
Processor | AMD BE-2400 (Dual Core, 2.3 GHz) |
Motherboard | MSI K9MM-V (Micro ATX, 2 SATA 1.5 GHz channels) |
Memory | 2 x 1 GB, OCZ Dual Channel DDR2 800 |
Hard Drive | 2 x 160 GB, Hitachi Deskstar 7K250 |
Software | |
OS/Desktop | Kubuntu 7.10 32-bit, KDE 3.5.8 |
Productivity | OpenOffice.org 2.3, Firefox 2.0.0.11 |
Development | KDevelop 3.5.0, GCC/G++ 4.1.3 |
Multimedia | Kino 1.1.0, Gwenview 1.4.1 |
To better understand the benefits of a RAID, I looked at the performance of a KDE-based Linux desktop using three different disk configurations:
- Single-disk configuration
- Two-disk striped setup (RAID 0)
- Two-disk mirrored setup (RAID 1)
The goal is to evaluate performance gains of a two-disk configuration (2 and 3) over a single-disk setup (1). Using each of the three configurations, I performed a series of disk-intensive tasks and measured the total execution time. The selected tasks represent operations that need to be repeated on a regular basis by three categories of users:
- Productivity and multitasking
- Start the system, including loading a KDE session (Web browser, instant
messenger, email, music player, and file explorer) - Light multitasking: burn a CD and start OpenOffice.org Writer at the same time
- Heavy Multitasking: start Firefox under a heavy disk load (slocate
indexing all files in the background)
- Start the system, including loading a KDE session (Web browser, instant
- Software development
- Run a full compilation of KStars
- Start KDevelop
- Debug KStars after modifying its code using KDevelop
- Find a file in the KDE Education Project’s
source tree
- Content creation and entertainment
- Export a six-minute video clip using Kino’s raw DV format
- Encode and create a six-minute DVD using Kino
- View a year in pictures using Gwenview (1,300 digital photos displayed as 48×48 thumbnails)
Productivity and multitasking
In theory, you can double sequential reads using a two-disk mirrored array. You can also double sequential writes by making it a striped array. In practice, the real-world performance gain is much smaller in most areas. As you can see on the Productivity Results chart (Figure 1), the total system startup time is reduced by 12 seconds for both dual disk setups, which is not bad if you are in a rush to check the local weather before leaving for work in the morning.
But the best improvement comes when you’re running more than one disk-intensive operation at the same time. In that situation, the performance gains are astonishing. Rather than waiting for the browser to start for almost a minute, you can have a Web browser up and running in 8 seconds while some other programs are intensively searching for data.
The mirrored array has a clear advantage for multiple reads. This is in line with the theory; the same data can be found on two independent disks. The system can therefore distribute concurrent read operations thus significantly improving the overall performance.
Software development
Software is rarely assembled from large blocks of data; it is usually spread across multiple small files to increase the readability of source code. For that reason, there is little to be gained from a striped setup because that configuration works better for sequential reads. Mirroring, on the other hand, could prove more useful for development tasks that run multiple operations in parallel.
With the exception of the full compilation, none of the development tasks I ran is multithreaded, so only small performance gains are apparent on the chart (Figure 2). Even the full compilation doesn’t show a big improvement because it is CPU-intensive and the disk is seldom being accessed during the test.
The test results could have been different if more tasks were simulated at once, such as loading files while a large project is being compiled. It is also possible that a quad-core CPU could benefit more from a dual-disk array during the full compilation.
Content creation and entertainment
Home video editing is a rewarding hobby, but since uncompressed video requires huge amounts of disk space, the performance of the storage system is critical to quick execution. With inadequate disk throughput, videos will load and save for extended periods of time, distracting you from the task at hand.
In the content creation arena, the RAID 0 striped array contender is the hands-down winner. As you can see on the Multimedia Results chart (Figure 3), videos will save in half the time, and you can browse years of digital photography in a snap. The chart shows less benefit for RAID 0 in creating DVDs because it is a CPU-intensive process. But once you are satisfied with the movie, you can relax while Kino is creating and burning the DVD.
Conclusion
My tests show a significant improvement with dual-disk desktop performance. The performance gain is more significant in some areas, such as video editing. If you store large amounts of data and need to load it on a frequent basis, you will benefit from RAID configuration for a little extra investment in hardware. It is strongly recommended that you use identical disks for an array — so for your next upgrade or new desktop computer purchase, order two hard drives instead of one.
In choosing a configuration, you should consider several factors. For instance, for valuable data, you should think twice before choosing a striped array. RAID 0 is less reliable than a single disk setup: you will lose your data if either of the disks fails. For this reason, if striping works best for you, frequent backups are a must. On the other hand, you can also mix and match both configurations. You could set up a mirrored array for the operating system and most of your user directory, and a striped array for temporary work folders that have a need for speed.