On Mar 22, 2012, at 3:03 AM, Jim Klimov wrote:

> 2012-03-21 22:53, Richard Elling wrote:
> ...
>>> This is why a single
>>> vdev's random-read performance is equivalent to the random-read
>>> performance of
>>> a single drive.
>> 
>> It is not as bad as that. The actual worst case number for a HDD with
>> zfs_vdev_max_pending
>> of one is:
>> average IOPS * ((D+P) / D)
>> where,
>> D = number of data vdevs
>> P = numebr of parity vdevs (1 for raidz, 2 for raidz2, 3 for raidz3)
>> total disks per set = D + P
> 
> I wrote in this thread that AFAIK for small blocks (i.e. 1-sector
> size worth of data) there would be P+1 sectors used to store the
> block, which is an even worse case at least capacity-wise, as well
> as impacting fragmentation => seeks, but might occasionally allow
> parallel reads of different objects (tasks running on disks not
> involved in storage of the one data sector and maybe its parities
> when required).
> 
> Is there any truth to this picture?

Yes, but it is a rare case for 512b sectors. It could be more common for 4KB
sector disks when ashift=12. However, in that case the performance increases
to the equivalent of mirroring, so there are some benefits.

FWIW, some people call this "RAID-1E"

> 
> Were there any research or tests regarding storage of many small
> files (1-sector sized or close to that) on different vdev layouts?

It is not a common case, so why bother?

> I believe that such files would use a single-sector-sized set of
> indirect blocks (dittoed at least twice), so one single-sector
> sized file would use at least 9 sectors in raidz2.

No. You can't account for the metadata that way. Metadata space is not 1:1 with
data space. Metadata tends to get written in 16KB chunks, compressed.

> 
> Thanks :)
> 
> 
>> We did many studies that verified this. More recent studies show
>> zfs_vdev_max_pending
>> has a huge impact on average latency of HDDs, which I also described in
>> my talk at
>> OpenStorage Summit last fall.
> 
> What about drives without (a good implementation of) NCQ/TCQ/whatever?

All HDDs I've tested suck. The form of the suckage is that the number of IOPS
stays relatively constant, but the average latency increases dramatically.  This
makes sense, due to the way elevator algorithms work.

> Does ZFS in-kernel caching, queuing and sorting of pending requests
> provide a similar service? Is it controllable with the same switch?

There are many caches at play here, with many tunables. The analysis doesn't
fit in an email.

> 
> Or, alternatively, is it a kernel-only feature which does not depend
> on hardware *CQ? Are there any benefits to disks with *CQ then? :)

Yes, SSDs with NCQ work very well.
 -- richard

--
DTrace Conference, April 3, 2012, 
http://wiki.smartos.org/display/DOC/dtrace.conf
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422






_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to