On Aug 25, 2009, at 9:38 PM, Tristan Ball wrote:
What I’m worried about that time period where the pool is
resilvering to the hot spare. For example: one half of a mirror has
failed completely, and the mirror is being rebuilt onto the spare –
if I get a read error from the remaining half of the mirror, then
I’ve lost data. If the RE drives return’s an error for a request
that a consumer drive would have (eventually) returned, then in this
specific case I would have been better off with the consumer drive.
The difference is the error detection time. In general, you'd like
errors to be
detected quickly. The beef with the consumer drives and the Solaris +
ZFS
architecture is that the drives do not return on error, they just keep
trying.
So you have to wait for the sd (or other) driver to timeout the
request. By
default, this is on the order of minutes. Meanwhile, ZFS is patiently
awaiting
a status on the request. For enterprise class drives, there is a
limited number
of retries on the disk before it reports an error. You can expect
responses of
success in the order of 10 seconds or less. After the error is
detected, ZFS
can do something about it.
All of this can be tuned, of course. Sometimes that tuning is ok by
default,
sometimes not. Until recently, the biggest gripes were against the iscsi
client which had a hardwired 3 minute error detection. For current
builds
you can tune these things without recompiling.
-- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss