On Thu, Feb 19, 2009 at 12:36:22PM -0800, Brandon High wrote: > On Thu, Feb 19, 2009 at 6:18 AM, Gary Mills <mi...@cc.umanitoba.ca> wrote: > > Should I file an RFE for this addition to ZFS? The concept would be > > to run ZFS on a file server, exporting storage to an application > > server where ZFS also runs on top of that storage. All storage > > management would take place on the file server, where the physical > > disks reside. The application server would still perform end-to-end > > error checking but would notify the file server when it detected an > > error. > > You could accomplish most of this by creating a iSCSI volume on the > storage server, then using ZFS with no redundancy on the application > server.
That's what I'd like to do, and what we do now. The RFE is to take advantage of the end-to-end checksums in ZFS in spite of having no redundancy on the application server. Having all of the disk management in one place is a great benefit. > You'll have two layers for checksums, one on the storage server's > zpool and a second on the application server's filesystem. The > application server won't be able to notify the storage server that > it's detected a bad checksum, other than through retries, but can > write a user-space monitor that watches for ZFS checksum errors and > sends notification to the storage server. The RFE is to enable the two instances of ZFS to exchange information about checksum failures. > To poke a hole in your idea: What if the app server does find an > error? What's the storage server to do at that point? Provided that > the storage server's zpool already has redundancy, the data written to > disk should already be exactly what was received from the client. If > you want to have the ability to recover from erorrs on the app server, > you should use a redundant zpool - Either a mirror or a raidz. Yes, if the two instances of ZFS disagree, we have a problem that needs to be resolved: they need to cooperate in this endevour. > If you're concerned about data corruption in transit, then it sounds > like something akin to T10 DIF (which others mentioned) would fit the > bill. You could also tunnel the traffic over a transit layer such as > TLS or SSH that provides a measure of validation. Latency should be > fun to deal with however. I'm mainly concerned that ZFS on the application server will detect a checksum error and then be unable to preserve the data. Iscsi already has TCP checksums. I assume that FC-AL does as well. Using more reliable checksums has no benefit if ZFS will still detect end-to-end checksum errors. -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking- _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss