Re Availability: ZFS needs to handle disk removal / 
 driver failure better
>> A better option would be to not use this to perform FMA diagnosis, but
>> instead work into the mirror child selection code.  This has already
>> been alluded to before, but it would be cool to keep track of latency
>> over time, and use this to both a) prefer one drive over another when
>> selecting the child and b) proactively timeout/ignore results from one
>> child and select the other if it's taking longer than some historical
>> standard deviation.  This keeps away from diagnosing drives as faulty,
>> but does allow ZFS to make better choices and maintain response times.
>> It shouldn't be hard to keep track of the average and/or standard
>> deviation and use it for selection; proactively timing out the slow I/Os
>> is much trickier.

  Interestingly, tracking latency has come under discussion in the
Linux world, too, as they start to deal with developing resource
management for disks as well as CPU.

  In fact, there are two cases where you can use a feedback loop to
adjust disk behavior, and a third to detect problems. The first 
loop is the one you identified, for dealing with near/far and
fast/slow mirrors.

  The second is for resource management, where one throttles
disk-hog projects when one discovers latency growing without
bound on disk saturation, and the third is in case of a fault
other than the above.

  For the latter to work well, I'd like to see the resource management
and fast/slow mirror adaptation be something one turns on explicitly,
because then when FMA discovered that you in fact have a fast/slow
mirror or a Dr. Evil program saturating the array, the "fix"
could be to notify the sysadmin that they had a problem and
suggesting built-in tools to ameliorate it. 

 
Ian Collins writes: 
> One solution (again, to be used with a remote mirror) is the three way 
> mirror.  If two devices are local and one remote, data is safe once the 
> two local writes return.  I guess the issue then changes from "is my 
> data safe" to "how safe is my data".  I would be reluctant to deploy a 
> remote mirror device without local redundancy, so this probably won't be 
> an uncommon setup.  There would have to be an acceptable window of risk 
> when local data isn't replicated.

  And in this case too, I'd prefer the sysadmin provide the information
to ZFS about what she wants, and have the system adapt to it, and
report how big the risk window is.

  This would effectively change the FMA behavior, you understand, so as 
to have it report failures to complete the local writes in time t0 and 
remote in time t1, much as the resource management or fast/slow cases would
need to be visible to FMA.

--dave (at home) c-b

-- 
David Collier-Brown            | Always do right. This will gratify
Sun Microsystems, Toronto      | some people and astonish the rest
[EMAIL PROTECTED]                 |                      -- Mark Twain
cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191#
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to