Bill Sommerfeld wrote:
> On Thu, 2008-08-28 at 13:05 -0700, Eric Schrock wrote:
>   
>> A better option would be to not use this to perform FMA diagnosis, but
>> instead work into the mirror child selection code.  This has already
>> been alluded to before, but it would be cool to keep track of latency
>> over time, and use this to both a) prefer one drive over another when
>> selecting the child and b) proactively timeout/ignore results from one
>> child and select the other if it's taking longer than some historical
>> standard deviation.  This keeps away from diagnosing drives as faulty,
>> but does allow ZFS to make better choices and maintain response times.
>> It shouldn't be hard to keep track of the average and/or standard
>> deviation and use it for selection; proactively timing out the slow I/Os
>> is much trickier.
>>     
>
> tcp has to solve essentially the same problem: decide when a response is
> "overdue" based only on the timing of recent successful exchanges in a
> context where it's difficult to make assumptions about "reasonable"
> expected behavior of the underlying network.
>
> it tracks both the smoothed round trip time and the variance, and
> declares a response overdue after (SRTT + K * variance).
>
> I think you'd probably do well to start with something similar to what's
> described in http://www.ietf.org/rfc/rfc2988.txt and then tweak based on
> experience.
>   

I think this is a good place to start. In general, we can see 3 orders 
of magnitude
range for magnetic disk I/Os, 4 orders of magnitude for power managed disks.
With that range, I don't see the variance being small, at least for 
magnetic disks.
SSDs will have a much smaller variance, in general.  For lopsided 
mirrors, such
as magnetic disk mirrored to SSD or Bob's Dallas vs New York paths, we 
should
be able to automatically steer towards the faster side.

However, A comprehensive solution must also deal with top-level vdev usage,
which can be very different than the physical vdevs.  We can use 
driver-level FMA
for the physical vdevs, but ultimately ZFS will need to be able to make 
decisions
based on the response time across the top-level vdevs.  This can be 
implemented in
two phases, of course.

I've got some lopsided mirror TNF data, so we could fairly easily try some
algorithms... I'll whip it into shape for further analysis.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to