I've got a zpool that has 4 raidz2 vdevs each with 4 disks (750GB), plus 4 
spares. At one point 2 disks failed (in different vdevs). The message in 
/var/adm/messages for the disks were 'device busy too long'. Then SMF printed 
this message:

Nov 23 04:23:51 x.x.com EVENT-TIME: Fri Nov 23 04:23:51 EST 2007
Nov 23 04:23:51 x.x.com PLATFORM: Sun Fire X4200 M2, CSN: 0734BD159F            
          , HOSTNAME: x.x.com
Nov 23 04:23:51 x.x.com SOURCE: zfs-diagnosis, REV: 1.0
Nov 23 04:23:51 x.x.com EVENT-ID: bb0f6d83-0c12-6f0f-d121-99d72f7de981
Nov 23 04:23:51 x.x.com DESC: A ZFS device failed.  Refer to 
http://sun.com/msg/ZFS-8000-D3 for more information.
Nov 23 04:23:51 x.x.com AUTO-RESPONSE: No automated response will occur.
Nov 23 04:23:51 x.x.com IMPACT: Fault tolerance of the pool may be compromised.
Nov 23 04:23:51 x.x.com REC-ACTION: Run 'zpool status -x' and replace the bad 
device.

Interestingly, zfs reported the failure but did not bring two of the spare 
disks online to temporarily replace the failed disks.

Here's the zpool history command to see what hapenned after the failures (from 
Nov 26 on):

2007-11-21.20:56:47 zpool create tank raidz2 c5t22d0 c5t30d0 c5t23d0 c5t31d0
2007-11-21.20:57:07 zpool add tank raidz2 c5t24d0 c5t32d0 c5t25d0 c5t33d0
2007-11-21.20:57:17 zpool add tank raidz2 c5t26d0 c5t34d0 c5t27d0 c5t35d0
2007-11-21.20:57:35 zpool add tank raidz2 c5t28d0 c5t36d0 c5t29d0 c5t37d0
2007-11-21.20:57:44 zpool scrub tank
2007-11-23.02:15:38 zpool scrub tank
2007-11-26.12:16:41 zpool online tank c5t23d0
2007-11-26.12:17:48 zpool online tank c5t23d0
2007-11-26.12:18:59 zpool add tank spare c5t17d0
2007-11-26.12:29:32 zpool offline tank c5t29d0
2007-11-26.12:32:08 zpool online tank c5t29d0
2007-11-26.12:32:35 zpool scrub tank
2007-11-26.12:34:15 zpool scrub -s tank
2007-11-26.12:34:22 zpool export tank
2007-11-26.12:43:42 zpool import tank tank.2
2007-11-26.12:45:45 zpool export tank.2
2007-11-26.12:46:32 zpool import tank.2
2007-11-26.12:47:02 zpool scrub tank.2
2007-11-26.12:48:11 zpool add tank.2 spare c5t21d0 c4t17d0 c4t21d0
2007-11-26.14:02:08 zpool scrub -s tank.2
2007-11-27.01:56:35 zpool clear tank.2
2007-11-27.01:57:02 zfs set atime=off tank.2
2007-11-27.01:57:07 zfs set checksum=fletcher4 tank.2
2007-11-27.01:57:45 zfs create tank.2/a
2007-11-27.01:57:46 zfs create tank.2/b
2007-11-27.01:57:47 zfs create tank.2/c
2007-11-27.01:59:39 zpool scrub tank.2
2007-12-05.15:31:51 zpool online tank.2 c5t23d0
2007-12-05.15:32:02 zpool online tank.2 c5t29d0
2007-12-05.15:36:58 zpool online tank.2 c5t23d0
2007-12-05.16:24:56 zpool replace tank.2 c5t23d0 c5t17d0
2007-12-05.21:52:43 zpool replace tank.2 c5t29d0 c5t21d0
2007-12-06.16:12:24 zpool online tank.2 c5t29d0
2007-12-11.13:08:13 zpool online tank.2 c5t23d0
2007-12-11.19:52:38 zpool online tank.2 c5t29d0

You can see that I manually attached 2 of the spare devices to the pool. 
Scrubbing finished fairly quickly (within 5 hours probably).

Here is what the pool status looks like right now:

  pool: tank.2
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: resilver completed with 0 errors on Tue Dec 11 19:58:17 2007
config:

|--------NAME           STATE     READ WRITE CKSUM
|--------tank.2      DEGRADED     0     0     0
|----------raidz2       DEGRADED     0     0     0
|------------c5t22d0    ONLINE       0     0     0
|------------c4t30d0    ONLINE       0     0     0
|------------spare      DEGRADED     0     0     0
|--------------c5t23d0  UNAVAIL      0     0     0  cannot open
|--------------c5t17d0  ONLINE       0     0     0
|------------c4t31d0    ONLINE       0     0     0
|----------raidz2       ONLINE       0     0     0
|------------c5t24d0    ONLINE       0     0     0
|------------c4t32d0    ONLINE       0     0     0
|------------c5t25d0    ONLINE       0     0     0
|------------c4t33d0    ONLINE       0     0     0
|----------raidz2       ONLINE       0     0     0
|------------c5t26d0    ONLINE       0     0     0
|------------c4t34d0    ONLINE       0     0     0
|------------c5t27d0    ONLINE       0     0     0
|------------c4t35d0    ONLINE       0     0     0
|----------raidz2       DEGRADED     0     0     0
|------------c5t28d0    ONLINE       0     0     0
|------------c4t36d0    ONLINE       0     0     0
|------------spare      DEGRADED     0     0     0
|--------------c5t29d0  UNAVAIL      0     0     0  cannot open
|--------------c5t21d0  ONLINE       0     0     0
|------------c4t37d0    ONLINE       0     0     0
|--------spares
|----------c5t17d0      INUSE     currently in use
|----------c5t21d0      INUSE     currently in use
|----------c4t17d0      AVAIL   
|----------c4t21d0      AVAIL   

errors: No known data errors

The disks failed because they were temporarily detached. The disks were brought 
back online. We can verify that the OS can actually read data from them:

ROOT $ dd if=/dev/rdsk/c5t29d0 of=tst bs=1024 count=1000000
^Z
[1]+  Stopped                 dd if=/dev/rdsk/c5t29d0 of=tst bs=1024 
count=1000000

ROOT $ bg
[1]+ dd if=/dev/rdsk/c5t29d0 of=tst bs=1024 count=1000000 &

ROOT $ iostat -xn c5t29d0 1
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    3.8    0.1    4.1    0.0  0.0  0.0    0.0    0.2   0   0 c5t29d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 5018.1    2.0 5031.1    0.0  0.0  0.8    0.0    0.2   5  80 c5t29d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 5180.4    0.0 5180.4    0.0  0.0  0.8    0.0    0.2   5  78 c5t29d0
                    extended device statistics              
^C

ROOT $ 1000000+0 records in
1000000+0 records out

We performed a similar test to make sure that data can be written to the disk 
without any problems. So the device is clearly online. We have rebooted the 
server just to make sure.

Now I try to bring the devices back online it gives me a one line message 
telling me that it's bringing the device online (no error messages):

ROOT $ zpool online tank.2 c5t29d0
Bringing device c5t29d0 online

 zpool status -x then tells me that it's doing a resilvering. But if I do a 
zpool iostat -v 1 you can see that it's actually resilvering the mirror disks 
again! Here is the zpool status -x:

ROOT $ zpool status tank.2
  pool: tank.2
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: resilver in progress, 0.01% done, 11h5m to go
config:

|--------NAME           STATE     READ WRITE CKSUM
|--------tank.2      DEGRADED     0     0     0
|----------raidz2       DEGRADED     0     0     0
|------------c5t22d0    ONLINE       0     0     0
|------------c4t30d0    ONLINE       0     0     0
|------------spare      DEGRADED     0     0     0
|--------------c5t23d0  UNAVAIL      0     0     0  cannot open
|--------------c5t17d0  ONLINE       0     0     0
|------------c4t31d0    ONLINE       0     0     0
|----------raidz2       ONLINE       0     0     0
|------------c5t24d0    ONLINE       0     0     0
|------------c4t32d0    ONLINE       0     0     0
|------------c5t25d0    ONLINE       0     0     0
|------------c4t33d0    ONLINE       0     0     0
|----------raidz2       ONLINE       0     0     0
|------------c5t26d0    ONLINE       0     0     0
|------------c4t34d0    ONLINE       0     0     0
|------------c5t27d0    ONLINE       0     0     0
|------------c4t35d0    ONLINE       0     0     0
|----------raidz2       DEGRADED     0     0     0
|------------c5t28d0    ONLINE       0     0     0
|------------c4t36d0    ONLINE       0     0     0
|------------spare      DEGRADED     0     0     0
|--------------c5t29d0  UNAVAIL      0     0     0  cannot open
|--------------c5t21d0  ONLINE       0     0     0
|------------c4t37d0    ONLINE       0     0     0
|--------spares
|----------c5t17d0      INUSE     currently in use
|----------c5t21d0      INUSE     currently in use
|----------c4t17d0      AVAIL   
|----------c4t21d0      AVAIL   

errors: No known data errors

And then here is some of the output of zpool iostat -v 1:

-------------     capacity     operations    bandwidth
pool---------  used  avail   read  write   read  write
-------------  -----  -----  -----  -----  -----  -----
tank.2      5.01T  5.86T    283     99  25.0M  1.11M
|--raidz2       1.25T  1.47T    114     26  13.3M   294K
|----c5t22d0        -      -    103     20  6.64M   124K
|----c4t30d0        -      -     85     21  5.39M   147K
|----spare          -      -      0    136      0  6.74M
|------c5t23d0      -      -      0      0      0      0
|------c5t17d0      -      -      0    135      0  6.74M
|----c4t31d0        -      -     72     21  4.40M   148K
|--raidz2       1.25T  1.47T     21     16  46.1K   212K
|----c5t24d0        -      -      9     16  14.4K   108K
|----c4t32d0        -      -     10     15  14.6K   108K
|----c5t25d0        -      -     11     15  16.6K   108K
|----c4t33d0        -      -     10     15  15.2K   107K
|--raidz2       1.25T  1.47T     28     23  57.7K   250K
|----c5t26d0        -      -     11     21  16.6K   127K
|----c4t34d0        -      -     10     20  15.2K   127K
|----c5t27d0        -      -     15     21  23.0K   127K
|----c4t35d0        -      -     16     21  24.8K   126K
|--raidz2       1.25T  1.47T    119     33  11.6M   377K
|----c5t28d0        -      -    109     22  5.79M   151K
|----c4t36d0        -      -     93     22  4.76M   151K
|----spare          -      -      0    137      0  5.95M
|------c5t29d0      -      -      0      0      0      0
|------c5t21d0      -      -      0    136      0  5.95M
|----c4t37d0        -      -     74     23  3.86M   190K
-------------  -----  -----  -----  -----  -----  -----

So notice that there is 0 disk traffic for the disk we are trying to bring 
online (c5t29d0), but there is write disk traffic for the spare disk AND the 
other spare disk. So it looks like it's resilvering both mirror disks again? 
(why would it need to do that?)

So I try using the replace command instead of the online command to tell it to 
bring itself online (and resilver only what has changed since it was brought 
online). But now it's complaining that the disk is already part of the same 
pool (since it's reading the old yet valid on-disk metadata for that disk which 
is still valid):

ROOT $ zpool replace tank.2 c5t29d0
invalid vdev specification
use '-f' to override the following errors:
/dev/dsk/c5t29d0s0 is part of active ZFS pool tank.2. Please see zpool(1M).

I could try the -f command to force it, but I want it to only resilver those 
parts that have changed.

I tried detaching the mirror in hope that it would recognize that the c5t29d0 
is online again:

ROOT $ zpool detach tank.2 c5t21d0

However, running zpool status again shows that the spare has been removed, but 
no change other than that. When I reattach the spare device immediately, the 
resilver process begins again (it looks like again from zpool iostat or iostat 
-xn that it is resilvering both of the attached spares, not just the one that 
I'm attaching again). Also, this resilver process takes quite a long time (like 
it has to resilver everything all over again, as opposed to just changes). Does 
the resilver logic work differently if there is a spare involved?

Any idea what is going wrong here? It seems that zfs should be able to online 
the disks since the OS can read/write perfectly fine to those devices. And it 
seems that if the online fails it shouldn't cause a resilver of both of the 
attached spares.

You will notice that the pool was renamed by doing zpool export tank, zpool 
import tank tank.2. Could this be causing ZFS to get confused when the device 
is brought online?

We are willing to try zpool replace -f on the disks that need to be brought 
online during the weekend to see what happens.

Here is the system info:
ROOT $ uname -a
SunOS x.x.com 5.10 Generic_120012-14 i86pc i386 i86pc

Will send showrev -p output if desired.

Thanks,
Kevin
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to