Re: [zfs-discuss] IOzone benchmarking

Erik Trimble Fri, 04 May 2012 20:16:08 -0700

On 5/4/2012 1:24 PM, Peter Tribble wrote:

On Thu, May 3, 2012 at 3:35 PM, Edward Ned Harvey
<opensolarisisdeadlongliveopensola...@nedharvey.com>  wrote:

I think you'll get better, both performance&  reliability, if you break each
of those 15-disk raidz3's into three 5-disk raidz1's.  Here's why:

Incorrect on reliability; see below.

Now, to put some numbers on this...
A single 1T disk can sustain (let's assume) 1.0 Gbit/sec read/write
sequential.  This means resilvering the entire disk sequentially, including
unused space, (which is not what ZFS does) would require 2.2 hours.  In
practice, on my 1T disks, which are in a mirrored configuration, I find
resilvering takes 12 hours.  I would expect this to be ~4 days if I were
using 5-disk raidz1, and I would expect it to be ~12 days if I were using
15-disk raidz3.

Based on your use of "I would expect", I'm guessing you haven't
done the actual measurement.

I see ~12-16 hour resilver times on pools using 1TB drives in
raidz configurations. The resilver times don't seem to vary
with whether I'm using raidz1 or raidz2.

Suddenly the prospect of multiple failures overlapping don't seem so
unlikely.

Which is *exactly* why you need multiple-parity solutions. Put
simply, if you're using single-parity redundancy with 1TB drives
or larger (raidz1 or 2-way mirroring) then you're putting your
data at risk. I'm seeing - at a very low level, but clearly non-zero -
occasional read errors during rebuild of raidz1 vdevs, leading to
data loss. Usually just one file, so it's not too bad (and zfs will tell
you which file has been lost). And the observed error rates we're
seeing in terms of uncorrectable (and undetectable) errors from
drives are actually slightly better than you would expect from the
manufacturers spec sheets.

So you definitely need raidz2 rather than raidz1; I'm looking at
going to raidz3 for solutions using current high capacity (ie 3TB)
drives.

(On performance, I know what the theory says about getting one
disk's worth of IOPS out of each vdev in a raidz configuration. In
practice we're finding that our raidz systems actually perform
pretty well when compared with dynamic stripes, mirrors, and
hardware raid LUNs.)

Really, guys: Richard, myself, and several others have covered how ZFSdoes resilvering (and on disk reliability, a related issue), andincluded very detailed calculations on IOPS required and discussionsabout slabs, recordsize, and how disks operate with regards toseek/access times and OS caching.

Please search the archives, as it's not fruitful to repost the exactsame thing repeatedly.

Short version: assuming identical drives and the exact same usagepattern and /amount/ of data, the time it takes the various ZFSconfigurations to resilver is N for ANY mirrored config and a bit lessthan N*M for a M-disk RAIDZ*, where M = the number of data disks in theRAIDZ* - thus a 6-drive (total) RAIDZ2 will have the same resilver timeas a 5-drive (total) RAIDZ1. Calculating what N is depends entirely onthe pattern which the data was written on the drive. You're alwaysgoing to be IOPS-bound on the disk being resilvered.

Which RAIDZ* config to use (assuming you have a fixed tolerance for dataloss) depends entirely on what your data usage pattern does to resilvertimes; configurations needing very long resilver times better have moreredundancy. And, remember, larger configs will allow for more data to bestored, that also increases resilver time.

Oh, and a RAIDZ* will /only/ ever get you slightly more than 1 disk'sworth of IOPS (averaged over a reasonable time period). Caching maymake it appear to give more IOPS in certain cases, but that's neithersustainable nor predictable, and the backing store is still only giving1 disk's IOPS. The RAIDZ* may, however, give you significantly morethroughput (in MB/s) than a single disk if you do a lot of sequentialread or write.


-Erik

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] IOzone benchmarking

Reply via email to