> It would be interesting to have a zfs enabled HBA to offload the checksum > and parity calculations. How much of zfs would such an HBA have to > understand?
That's an interesting question. For parity, it's actually pretty easy. One can envision an HBA which took a group of related write commands and computed the parity on the fly, using it for a final write command. This would, however, probably limit the size of a block that could be written to whatever amount of memory was available for buffering on the HBA. (Of course, memory is relatively cheap these days, but it's still not free, so the HBA might have only a few megabytes.) The checksum is more difficult. If you're willing to delay writing an indirect block until all of its children have been written [*], then we can just compute the checksum for each block as it goes out, and that's easy [**] -- easier than the parity, in fact, since there's no buffering required beyond the checksum itself. ZFS in fact does delay this write at present. However, I've argued in the past that ZFS shouldn't delay it, but should write indirect blocks in parallel with the data blocks. It would be interesting to determine whether the performance improvement of doing checksums on the HBA would outweigh the potential benefit of writing indirect blocks in parallel. Maybe it would for larger writes. Anyone got an FPGA programmer and an open-source SATA implementation? :-) (Unfortunately storage protocols have a complex analog side, and except for 1394, I'm not aware of any implementations that separate the digital/analog, which makes prototyping a lot harder, at least without much more detailed documentation on the controllers than you're likely to find.) -- Anton [*] Actually, you don't need to delay until the writes have made it to disk, but since you want to compute the checksum as the data goes out to the disk rather than making a second pass over it, you'd need to wait until the data has at least been sent to the drive cache. [**] For SCSI and FC, there's added complexity in that the drives can request data out-of-order. You can disable this but at the cost of some performance on high-end drives. This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss