Re: [zfs-discuss] Understanding when (and how) ZFS will use spare disks

2009-09-04 Thread Scott Meilicke
This sounds like the same behavior as opensolaris 2009.06. I had several disks 
recently go UNAVAIL, and the spares did not take over. But as soon as I 
physically removed a disk, the spare started replacing the removed disk. It 
seems UNAVAIL is not the same as the disk not being there. I wish the spare 
*would* take over in these cases, since the pool is degraded.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Understanding when (and how) ZFS will use spare disks

2009-09-04 Thread Chris Siebenmann
 We have a number of shared spares configured in our ZFS pools, and
we're seeing weird issues where spares don't get used under some
circumstances.  We're running Solaris 10 U6 using pools made up of
mirrored vdevs, and what I've seen is:

* if ZFS detects enough checksum errors on an active disk, it will
  automatically pull in a spare.
* if the system reboots without some of the disks available (so that
  half of the mirrored pairs drop out, for example), spares will *not*
  get used. ZFS recognizes that the disks are not there; they are marked
  as UNAVAIL and the vdevs (and pools) as DEGRADED, but it doesn't try to
  use spares.

(This is in a SAN environment where half of all of the mirrors come
from one controller and half come from another one.)

 All of this makes me think that I don't understand how ZFS spares
really work, and under what circumstances they'll get used. Does
anyone know if there's a writeup of this somewhere?

(What I've gathered so far from reading zfs-discuss archives is that
ZFS spares are not handled automatically in the kernel code but are
instead deployed to pools by a fmd ZFS management module[*], doing more
or less 'zpool repace pool failing-dev spare' (presumably through
an internal code path, since 'zpool history' doesn't seem to show spare
deployment). Is this correct?)

 Also, searching turns up some old zfs-discuss messages suggesting that
not bringing in spares in response to UNAVAIL disks was a bug that's now
fixed in at least OpenSolaris. If so, does anyone know if the fix has
made it into S10 U7 (or is planned or available as a patch)?

 Thanks in advance.

- cks
[*: http://blogs.sun.com/eschrock/entry/zfs_hot_spares suggests that
it is 'zfs-retire', which is separate from 'zfs-diagnosis'.]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss