On Thu, 2008-08-28 at 13:05 -0700, Eric Schrock wrote: > A better option would be to not use this to perform FMA diagnosis, but > instead work into the mirror child selection code. This has already > been alluded to before, but it would be cool to keep track of latency > over time, and use this to both a) prefer one drive over another when > selecting the child and b) proactively timeout/ignore results from one > child and select the other if it's taking longer than some historical > standard deviation. This keeps away from diagnosing drives as faulty, > but does allow ZFS to make better choices and maintain response times. > It shouldn't be hard to keep track of the average and/or standard > deviation and use it for selection; proactively timing out the slow I/Os > is much trickier.
tcp has to solve essentially the same problem: decide when a response is "overdue" based only on the timing of recent successful exchanges in a context where it's difficult to make assumptions about "reasonable" expected behavior of the underlying network. it tracks both the smoothed round trip time and the variance, and declares a response overdue after (SRTT + K * variance). I think you'd probably do well to start with something similar to what's described in http://www.ietf.org/rfc/rfc2988.txt and then tweak based on experience. - Bill _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss