On 2/20/2009 9:33 AM, Gary Mills wrote:
On Thu, Feb 19, 2009 at 09:59:01AM -0800, Richard Elling wrote:
Gary Mills wrote:
Should I file an RFE for this addition to ZFS?  The concept would be
to run ZFS on a file server, exporting storage to an application
server where ZFS also runs on top of that storage.  All storage
management would take place on the file server, where the physical
disks reside.  The application server would still perform end-to-end
error checking but would notify the file server when it detected an
error.
Currently, this is done as a retry. But retries can suffer from cached
badness.

So, ZFS on the application server would retry the read from the
storage server.  This would be the same as it does from a physical
disk, I presume.  However, if the checksum failure persisted, it
would declare an error.  That's where the RFE comes in, because it
would then notify the file server to utilize its redundant data
source.  Perhaps this could be done as part of the retry, using
existing protocols.
I'm no expert, but I think not only "would this have been taken care of by the retry" but if the error is being introduced by any HW or SW on the storage server's end, then the storage server will already be checking it's checksums.

The main place the new errors could be introduced will be after the data left ZFS's control, heading out the network interface across the wires, and into the application server... While not impossible for the same error to creep in on every retry, I think it'd be rarer than different errors each time, and the retries would have a very good chance eventually getting good copies of every block.

Even if the application server could notify the storage server of the problem. There isn't any thing more the storage server can do. If there was a problem that it's redundancy could fix, it's checksums would have identified that, and it would have fixed it even before the data was sent to the application server.
There are several advantages to this configuration.  One current
recommendation is to export raw disks from the file server.  Some
storage devices, including I assume Sun's 7000 series, are unable to
do this.  Another is to build two RAID devices on the file server and
to mirror them with ZFS on the application server.  This is also
sub-optimal as it doubles the space requirement and still does not
take full advantage of ZFS error checking.  Splitting the
responsibilities works around these problems
I'm not convinced, but here is how you can change my mind.

1. Determine which faults you are trying to recover from.

I don't think this has been clearly identified, except that they are
``those faults that are only detected by end-to-end checksums''.

Adding ZFS on the appserver will add a new set of checksums for the data's journey over the wire and back again. Nothing will be checking those checksums on the storage server to see if corruption happened to writes on the way there (which might be a place for improvement - but I'm not sure how that can even be done,) but those same checksums will be sent back to the appserver on a read, so the appserver will be able to determine the problem then - Of course if the corruption happenned while sending the write, then no amount of retries will help. Only ZFS redundancy on the app server can (currently) help with that.

  -Kyle

2. Prioritize these faults based on their observability, impact,
and rate.

Perhaps the project should be to extend end-to-end checksums in
situations that don't have end-to-end redundancy.  Redundancy at the
storage layer would be required, of course.


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to