Hard disk continuous-write measurements

Hard disk continuous-write measurements

3 May 1998

Collecting packet header trace with the Coral software requires writing to hard disk at high speed and in large chunks. There is no need for much head repositioning, as Coral can feed data to the hard disk, megabyte by megabyte, continuously. An option to consider would even be to write to raw hard disk.

To verify some hard disk write performances for the Coral environment, several tests were undertaken with a simple dskwtst.c program, which continually writes in one megabyte chunks to hard disk, and repeats that 20 times before statistics are collected.

dskwtst can write to either a mounted or a raw disk. E.g.:

   dskwtst > /usr/tmp/outputfile 2> ./log

would create a large file (of no useful content) in /usr/tmp/outputfile, and logs the results in ./log (assuming a shell is used that supports stderr redirection). In the case of:

   dskwtst > /dev/rsd0 2> ./log

it would instead write to the /dev/rsd0 raw device. This case has to be done as root, and, if possible, overwriting a disk with still needed data on it (such as the running operating system) should be avoided. Writing to the raw device increases the performance predictability (no synchronous SYNCing, etc.).

All tests in this writeup were done to one or multiple raw disks, and care was taken so no partitioned disk was insured or killed during the making of this writeup. Furthermore, all tests were done using either a 300MHz or 400MHz Pentium II host processor. All disks were of type UltraWide SCSI.


The first graph (the one above) is showing the performance of four individual runs of dskwtst on different kinds of disk drives, namely:

The input used here is the "average" output from the dskwtst program. E.g., if:

  20971520 bytes in 5.002443 seconds, 4.192256 MBps, 5.245273 MBps avg
was the individual output line, the 5.245273 was being used, which is a running average of all the individual 20MB performances.

While the three aforementioned lines (BLACK, RED, GREEN) were done with an Adaptec 2940 hard disk controller, the fourth (BLUE) line depicts the same averaged performance, but with two RAIDed (RAID 0) Seagate Cheetah drives which appeared like a single drive of twice the size (due to the RAID). The RAID controller used was a DPT SmartCache IV. Besides the pairing of the drives, the RAID controller used default configuration values and had 32MB of a local hardware cache installed, which is what the initial high peak and backoff functions can be attributed to.


This graph is repeating the dual-Cheetah data, but adds the individual performances of the 20MB steps. Again, the impact of the 32MB cache on the controller is quite visible.


Since the by far superior individual drive of the ones tested was the IBM one, the remaining tests focus on those drives.


This graph compares performances of the IBM drives based on how the DPT SmartCache IV controller was configured, with the BLACK and RED lines reflecting the performance of an unRAIDed single drive (while a pair of drives was connected), and the GREEN and BLUE lines depicting a RAID 0 situation where the two drives appeared like a single one.


For comparison reason the RAID 0 situation was also tested in a Linux 2.0.33 environment. Obviously, the multiple caches (host/OS, hard disk controller, and possibly the ones on the disk drives) are trying their best to make things look good. In the end, while there is short-term gain in the Linux version for several of the 20 one megabyte write operations due to the host cache, over time the average write performance is comparable to the FreeBSD 2.2.6 situation.


Obviously, the RAID solution, while doing its RAID thing, was less than totally impressive, performance wise. Hence more testing was done with the Adaptec 2940 controller.


This graph shows performances of an Adaptec 2940 controller with two of the IBM drives attached. Both drives individually (BLACK and RED lines) have very comparable performances. The big dip in the black line probably means that the processor had something better to do during that time (like my looking at the output file, or whatever).

The GREEN and BLUE lines reflect the performance of simultaneous writes with two dskwtst programs running, but each writing to a different one of the individual disk drives. Hence, for system performance, the summary of the GREEN and BLUE lines would have to be used, i.e., something around 17MBps.


Similar to the previous graph, but using two Adaptec 2940 controllers, each with one drive attached, and two dskwtst tasks writing simultaneously to the two drives via the two controllers with a systemic performance of about 22MBps, a speed at which it takes about 45 seconds to write a gigabyte of data. The second controller appears to compensate for all the speed losses of writing to a second drive on a single controller.

Now that the deep dip happened a second time (RED line this time), I haven't the foggiest as to why that is.


After confirmation that using fwrite instead of write did not make much of a difference for the writing speed of large records, the above graph depicts the behavior of a version of dskwtst that uses write instead of fwrite, and also modulates the output record sizes. Measurement summaries were still done after completion of 20MB of data, on the first gigabyte of one of the IBM hard drives with an Adaptec 2940 controller. The data seen reflects the averages of those measurements for various "write" output record sizes:


Of further interest is the FreeBSD "ccd" device, which emulates a RAID in software. These tests were run on the two IBM drives.

The two graphs below differ in that the first displays the actual per 20MB performance, while the second uses the running average.

Notes:

  1. The BLACK line is reflective of a real partition, which is mounted. Commands executed to derive the data were:

    which created an almost 18GB file:

           #nai[339]/F2 11:31 255: l
           total 34632340
           drwxr-xr-x   2 root  wheel          512 May  4 11:11 ./
           drwxr-xr-x  16 root  wheel          512 May  4 10:33 ../
           -rw-rw-r--   1 root  wheel  17723088896 May  4 11:31 x
           #nai[340]/F2 11:33 0: 
           

  2. The RED line was derived after:

    In other words, it is utilizing the raw partition to avoid some OS overhead. An issue was that it did not find an "end of hard disk" indication, and the program could not even get forcefully aborted (well, a reboot eventually helped, the operating system was running file, just dskwtst had no intentions to exit).

  3. The GREEN line is utilizing two Adaptec 2940 controllers instead of one, and a:

    was redone after the reboot. The end-of-disk issue appeared of course in this environment as well.


    Obviously, getting to the advertised figures of 20-40MBps is still a far stretch for more normal applications, and without special care and feeding. However, interleaved, but simultaneous, writes to two disk drives via two controllers yields good results for continuous writing of large data sets. The FreeBSD "ccd" device simplifies such strategies, by making the RAIDing transparent to the application, while still yielding good performance.

    Note that other things can impact performances as well. For example, the SmartCache IV manual states that if it detects an external SCSI cable, it reduces the speed for all its SCSI attachments to 5MHz.

    Bottom line for the applications described here, to get good performance:


    3 September 1998 additions


    Original output data sets from the testing:


    Disclaimer: Results in this writeup are only reflective of the specific performance of the specific software and hardware environment tested here for a specific application. It may or may not be reflective of performances in other environments and/or with other applications.