>>>>> "j" == jkait <jkait...@gmail.com> writes:
j> [b] The Lustre team, I believe, is looking at porting the DMU j> **not** the entire zfs stack. wow. That's even more awesome. In that case, since they are more or less making their own filesystem, maybe it will be natural to validate checksums on the clients. j> http://www.enterprisestorageforum.com/continuity/news/article.php/3672651 meh, wake me when it's over. Another thing which interests me in light of recent discussion, is checksums which can be broken if write barriers are violated. It's forever impossible to tell if your data is ``up-to-date'' with just a checksum because it will be valid tomorrow if it's valid today, but you can tell if a bag of checksums match with each other, perhaps be warned if the filesystem has recovered to some new and seemingly-valid state through which, were it respecting fsync() barriers, it could never have passed before the data loss. With this feature, instead of just insuring the insides of files as invalid, ZFS could put seals on whole datasets, and we would see these checksum seals broken if we disabled the ZIL. It could become meaningful to put a seal on a heirarchy of datasets, which would be broken if you mounted a tree of snapshots of those datasets which were not taken atomically. This also becomes more meaningful with filesystems like HAMMER that have infinite snapshots, where you may want metadata checksums to seal the filesystems' history, a history which could be broken if drives write checksum-sized blocks, but write them in the wrong order. I don't see how raw storage can do anything but put checksums on block-sized chunks, which is useful for data in flight but not that useful to store. The stored checksum can prove ``this exact block was once handed to me, and I was once told to write it to this LBA on this LUN.'' So what? Yes, I agree that happened, but it might have been two years ago. that doesn't mean the block is what belongs there _right now_. I could have overwritten that block 100 times since then. You need a metadata heirarchy to know that. What the SCSI extensions could do is extend the checksums that all the big storage vendors are already doing over the FC/iSCSI SAN, and thus stop ZFS advocates from pointing at weak TCP checksums, ancient routers, SAN bitflip gremlins when pools with single-lun vdevs become corrupt. The storage vendor pitch about helping to _find_ the corruption problems---I buy that one. ZFS is notoriously poor at that job. But I don't think the SCSI extension is helpful for extending the halo of the on-disk protection domain through the filesystem and above it, past a network filesystem. They can't do that by adding SCSI commands. It's simply irrelevant to the task, unless SCSI is going to become its own non-POSIX filesystem with snapshots and a virtual clock, which it had better not. Lustre could do it, though, especially if they are building their own filesystem from zpool pieces right above the transactional layer, not just using ZFS as a POSIX backing store.
pgpRddtuLuMWd.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss