Re: [zfs-discuss] RFE for two-level ZFS

Kyle McDonald Fri, 20 Feb 2009 06:54:16 -0800

On 2/20/2009 9:33 AM, Gary Mills wrote:

On Thu, Feb 19, 2009 at 09:59:01AM -0800, Richard Elling wrote:

Gary Mills wrote:

Should I file an RFE for this addition to ZFS?  The concept would be
to run ZFS on a file server, exporting storage to an application
server where ZFS also runs on top of that storage.  All storage
management would take place on the file server, where the physical
disks reside.  The application server would still perform end-to-end
error checking but would notify the file server when it detected an
error.

Currently, this is done as a retry. But retries can suffer from cached
badness.


So, ZFS on the application server would retry the read from the
storage server.  This would be the same as it does from a physical
disk, I presume.  However, if the checksum failure persisted, it
would declare an error.  That's where the RFE comes in, because it
would then notify the file server to utilize its redundant data
source.  Perhaps this could be done as part of the retry, using
existing protocols.

I'm no expert, but I think not only "would this have been taken care ofby the retry" but if the error is being introduced by any HW or SW onthe storage server's end, then the storage server will already bechecking it's checksums.

The main place the new errors could be introduced will be after the dataleft ZFS's control, heading out the network interface across the wires,and into the application server... While not impossible for the sameerror to creep in on every retry, I think it'd be rarer than differenterrors each time, and the retries would have a very good chanceeventually getting good copies of every block.

Even if the application server could notify the storage server of theproblem. There isn't any thing more the storage server can do. If therewas a problem that it's redundancy could fix, it's checksums would haveidentified that, and it would have fixed it even before the data wassent to the application server.

There are several advantages to this configuration.  One current
recommendation is to export raw disks from the file server.  Some
storage devices, including I assume Sun's 7000 series, are unable to
do this.  Another is to build two RAID devices on the file server and
to mirror them with ZFS on the application server.  This is also
sub-optimal as it doubles the space requirement and still does not
take full advantage of ZFS error checking.  Splitting the
responsibilities works around these problems

I'm not convinced, but here is how you can change my mind.

1. Determine which faults you are trying to recover from.


I don't think this has been clearly identified, except that they are
``those faults that are only detected by end-to-end checksums''.

Adding ZFS on the appserver will add a new set of checksums for thedata's journey over the wire and back again. Nothing will be checkingthose checksums on the storage server to see if corruption happened towrites on the way there (which might be a place for improvement - butI'm not sure how that can even be done,) but those same checksums willbe sent back to the appserver on a read, so the appserver will be ableto determine the problem then - Of course if the corruption happennedwhile sending the write, then no amount of retries will help. Only ZFSredundancy on the app server can (currently) help with that.


  -Kyle

2. Prioritize these faults based on their observability, impact,
and rate.


Perhaps the project should be to extend end-to-end checksums in
situations that don't have end-to-end redundancy.  Redundancy at the
storage layer would be required, of course.


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] RFE for two-level ZFS

Reply via email to