[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

Anton B. Rang Wed, 13 Sep 2006 08:58:04 -0700

> It would be interesting to have a zfs enabled HBA to offload the checksum
> and parity calculations. How much of zfs would such an HBA have to
> understand?


That's an interesting question.

For parity, it's actually pretty easy.  One can envision an HBA which took a
group of related write commands and computed the parity on the fly, using
it for a final write command.  This would, however, probably limit the size
of a block that could be written to whatever amount of memory was available
for buffering on the HBA.  (Of course, memory is relatively cheap these days,
but it's still not free, so the HBA might have only a few megabytes.)

The checksum is more difficult.  If you're willing to delay writing an indirect
block until all of its children have been written [*], then we can just compute
the checksum for each block as it goes out, and that's easy [**] -- easier than 
the
parity, in fact, since there's no buffering required beyond the checksum itself.
ZFS in fact does delay this write at present.  However, I've argued in the past
that ZFS shouldn't delay it, but should write indirect blocks in parallel with 
the
data blocks.  It would be interesting to determine whether the performance
improvement of doing checksums on the HBA would outweigh the potential
benefit of writing indirect blocks in parallel.  Maybe it would for larger 
writes.

Anyone got an FPGA programmer and an open-source SATA implementation?  :-)
(Unfortunately storage protocols have a complex analog side, and except for
1394, I'm not aware of any implementations that separate the digital/analog,
which makes prototyping a lot harder, at least without much more detailed
documentation on the controllers than you're likely to find.)

-- Anton

[*] Actually, you don't need to delay until the writes have made it to disk, but
since you want to compute the checksum as the data goes out to the disk rather
than making a second pass over it, you'd need to wait until the data has at 
least
been sent to the drive cache.

[**] For SCSI and FC, there's added complexity in that the drives can request
data out-of-order. You can disable this but at the cost of some performance
on high-end drives.
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

Reply via email to