On Aug 25, 2009, at 9:38 PM, Tristan Ball wrote:

What I’m worried about that time period where the pool is resilvering to the hot spare. For example: one half of a mirror has failed completely, and the mirror is being rebuilt onto the spare – if I get a read error from the remaining half of the mirror, then I’ve lost data. If the RE drives return’s an error for a request that a consumer drive would have (eventually) returned, then in this specific case I would have been better off with the consumer drive.

The difference is the error detection time. In general, you'd like errors to be detected quickly. The beef with the consumer drives and the Solaris + ZFS architecture is that the drives do not return on error, they just keep trying. So you have to wait for the sd (or other) driver to timeout the request. By default, this is on the order of minutes. Meanwhile, ZFS is patiently awaiting a status on the request. For enterprise class drives, there is a limited number of retries on the disk before it reports an error. You can expect responses of success in the order of 10 seconds or less. After the error is detected, ZFS
can do something about it.

All of this can be tuned, of course. Sometimes that tuning is ok by default,
sometimes not. Until recently, the biggest gripes were against the iscsi
client which had a hardwired 3 minute error detection. For current builds
you can tune these things without recompiling.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to