This is Solaris 10U3 w/127111-05.

It appears that one of the disks in my zpool died yesterday. I got 
several SCSI errors finally ending with 'device not responding to 
selection'. That seems to be all well and good. ZFS figured it out and 
the pool is degraded:

maxwell /var/adm >zpool status
  pool: pool1
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas 
exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: none requested
config:

        NAME         STATE     READ WRITE CKSUM
        pool1        DEGRADED     0     0     0
          raidz1     DEGRADED     0     0     0
            c0t9d0   ONLINE       0     0     0
            c0t10d0  ONLINE       0     0     0
            c0t11d0  ONLINE       0     0     0
            c0t12d0  ONLINE       0     0     0
            c2t0d0   ONLINE       0     0     0
            c2t1d0   ONLINE       0     0     0
            c2t2d0   UNAVAIL  1.88K 17.98     0  cannot open

errors: No known data errors


My question is why does ZFS keep attempting to open the dead device? At 
least that's what I assume is happening. About every minute, I get eight 
of these entries in the messages log:

Feb 12 10:15:54 maxwell scsi: [ID 107833 kern.warning] WARNING: 
/[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 (sd32):
Feb 12 10:15:54 maxwell         disk not responding to selection

I also got a number of these thrown in for good measure:

Feb 11 22:21:58 maxwell scsi: [ID 107833 kern.warning] WARNING: 
/[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 (sd32):
Feb 11 22:21:58 maxwell         SYNCHRONIZE CACHE command failed (5)


Since the disk died last night (at about 11:20pm EST) I now have over 
15K of similar entries in my log. What gives? Is this expected behavior? 
If ZFS knows the device is having problems, why does it not just leave 
it alone and wait for user intervention?

Also, I noticed that the 'action' says to attach the device and 'zpool 
online' it. Am I correct in assuming that a 'zpool replace' is what 
would really be needed, as the data on the disk will be outdated?

Thanks,
-Brian

-- 
---------------------------------------------------
Brian H. Nelson         Youngstown State University
System Administrator   Media and Academic Computing
              bnelson[at]cis.ysu.edu
---------------------------------------------------

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to