Hello all,I tried to test the behavior of zpool recovering after removing one drive with strange results.
Setup SunFire V240/4Gig RAM, Solaris10u5, fully patched (last week)1 3510 12x 140Gig FC Drives, 12 luns (every drive is one lun), (I don't want to use the RAID hardware, letting ZFS doing all.)
one pool with 5x2 disks and 2 spares (details below) After pulling drive 2 it took about two minutes to recognise the situation.zpool status command output and also zpool iostat 1 command output is very slow. some lines are fast, then it stops for about 30-60 seconds, but they do complete after all. the resilver has started but is VERY slow and shows strange data. The % done value is going up and down all the time. I don't think it is working correctly. zpool iostat 1 (when it works) shows many reads but very few writes. I would have expected a mainly equal read and write rate reading from the intact mirror-side writing to the spare-disk.
Most of the time during resilver the machine is 99% idle, maximum 10% kernel load for some short times.
Now I have waited for more than one day but nothing is getting better. I did not put a new drive in, I wanted to see one spare getting into use. snip of zpool iostat 1 tank 337G 343G 313 2 37.4M 19.3K tank 337G 343G 240 5 29.0M 38.6K tank 337G 343G 355 6 44.4M 45.0K tank 337G 343G 336 8 41.6M 57.9K tank 337G 343G 422 0 46.0M 0 tank 337G 343G 415 10 49.4M 70.8K tank 337G 343G 358 0 43.3M 0 tank 337G 343G 340 10 42.6M 70.8K tank 337G 343G 323 5 38.1M 38.6K tank 337G 343G 315 0 35.0M 0 tank 337G 343G 336 0 40.0M 6.43K tank 337G 343G 388 10 46.8M 70.8K tank 337G 343G 351 4 43.9M 32.2K tank 337G 343G 5 5 620K 285Knothing useful (at least for me) in messages. after grep -v of the both lines date+time nftp scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],700000/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0/[EMAIL PROTECTED],1 (ssd48):
date+time nftp drive offline only these entries to see: Aug 27 13:04:22 nftp i/o to invalid geometry Aug 27 13:04:32 nftp i/o to invalid geometry Aug 27 13:04:37 nftp i/o to invalid geometry Aug 27 13:04:37 nftp i/o to invalid geometry Aug 27 13:04:47 nftp i/o to invalid geometry Aug 27 13:04:52 nftp i/o to invalid geometryAug 27 13:05:23 nftp fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major
Aug 27 13:05:23 nftp EVENT-TIME: Wed Aug 27 13:05:22 CEST 2008 Aug 27 13:05:23 nftp PLATFORM: SUNW,Sun-Fire-V240, CSN: -, HOSTNAME: nftp Aug 27 13:05:23 nftp SOURCE: zfs-diagnosis, REV: 1.0 Aug 27 13:05:23 nftp EVENT-ID: ea01afff-c58e-6b32-e345-81da8bf43146Aug 27 13:05:23 nftp DESC: A ZFS device failed. Refer to http://sun.com/msg/ZFS-8000-D3 for more information.
Aug 27 13:05:23 nftp AUTO-RESPONSE: No automated response will occur. Aug 27 13:05:23 nftp IMPACT: Fault tolerance of the pool may be compromised.Aug 27 13:05:23 nftp REC-ACTION: Run 'zpool status -x' and replace the bad device.
uname -a SunOS nftp 5.10 Generic_137111-04 sun4u sparc SUNW,Sun-Fire-V240 ######################################################################################## before pulling drive: sccli> show diskCh Id Size Speed LD Status IDs Rev ---------------------------------------------------------------------------- 2(3) 0 136.73GB 200MB ld0 ONLINE SEAGATE ST314680FSUN146G 0407
S/N 3HY602V300007412 WWNN 2000000C505EB8112(3) 1 136.73GB 200MB ld1 ONLINE SEAGATE ST314680FSUN146G 0407
S/N 3HY61JX400007412 WWNN 2000000C505EB8852(3) 2 136.73GB 200MB ld2 ONLINE SEAGATE ST3146807FC 0006
S/N 3HY62EGZ00007443 WWNN 2000000C50D761302(3) 3 136.73GB 200MB ld3 ONLINE SEAGATE ST314680FSUN146G 0407
S/N 3HY61JKG00007411 WWNN 2000000C505EB8152(3) 4 136.73GB 200MB ld4 ONLINE SEAGATE ST314680FSUN146G 0407
S/N 3HY60YHX00007410 WWNN 2000000C505EBCBB2(3) 5 136.73GB 200MB ld5 ONLINE SEAGATE ST314680FSUN146G 0407
S/N 3HY61FQ000007412 WWNN 2000000C505E98B92(3) 6 136.73GB 200MB ld6 ONLINE SEAGATE ST314680FSUN146G 0407
S/N 3HY61F2E00007411 WWNN 2000000C505E8DB72(3) 7 136.73GB 200MB ld7 ONLINE SEAGATE ST314680FSUN146G 0407
S/N 3HY60Y1100007412 WWNN 2000000C505E98BB2(3) 8 136.73GB 200MB ld8 ONLINE SEAGATE ST314680FSUN146G 0407
S/N 3HY61D0A00007411 WWNN 2000000C505E6A562(3) 9 136.73GB 200MB ld9 ONLINE SEAGATE ST314680FSUN146G 0407
S/N 3HY61AQ200007411 WWNN 2000000C505EC2B42(3) 10 136.73GB 200MB ld10 ONLINE SEAGATE ST314680FSUN146G 0407
S/N 3HY61JP900007412 WWNN 2000000C505EB7122(3) 11 136.73GB 200MB ld11 ONLINE SEAGATE ST314680FSUN146G 0407
S/N 3HY61JZC00007412 WWNN 2000000C505EB9B2 sccli> [EMAIL PROTECTED]:/>zpool status pool: tank state: ONLINE scrub: scrub completed with 0 errors on Thu Aug 21 17:22:16 2008 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t40d0 ONLINE 0 0 0 c2t40d1 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t40d2 ONLINE 0 0 0 c2t40d3 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t40d4 ONLINE 0 0 0 c2t40d5 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t40d6 ONLINE 0 0 0 c2t40d7 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t40d8 ONLINE 0 0 0 c2t40d9 ONLINE 0 0 0 sparesc2t40d10 AVAIL c2t40d11 AVAIL
errors: No known data errors [EMAIL PROTECTED]:/> ######################################################################################## after pulling drive: sccli> show diskCh Id Size Speed LD Status IDs Rev ---------------------------------------------------------------------------- 2(3) 0 136.73GB 200MB ld0 ONLINE SEAGATE ST314680FSUN146G 0407
S/N 3HY602V300007412 WWNN 2000000C505EB8112 1 0MB 0MB NONE MISSING SEAGATE ST314680FSUN146G 0407
S/N 3HY61JX4000074122(3) 2 136.73GB 200MB ld2 ONLINE SEAGATE ST3146807FC 0006
S/N 3HY62EGZ00007443 WWNN 2000000C50D761302(3) 3 136.73GB 200MB ld3 ONLINE SEAGATE ST314680FSUN146G 0407
S/N 3HY61JKG00007411 WWNN 2000000C505EB8152(3) 4 136.73GB 200MB ld4 ONLINE SEAGATE ST314680FSUN146G 0407
S/N 3HY60YHX00007410 WWNN 2000000C505EBCBB2(3) 5 136.73GB 200MB ld5 ONLINE SEAGATE ST314680FSUN146G 0407
S/N 3HY61FQ000007412 WWNN 2000000C505E98B92(3) 6 136.73GB 200MB ld6 ONLINE SEAGATE ST314680FSUN146G 0407
S/N 3HY61F2E00007411 WWNN 2000000C505E8DB72(3) 7 136.73GB 200MB ld7 ONLINE SEAGATE ST314680FSUN146G 0407
S/N 3HY60Y1100007412 WWNN 2000000C505E98BB2(3) 8 136.73GB 200MB ld8 ONLINE SEAGATE ST314680FSUN146G 0407
S/N 3HY61D0A00007411 WWNN 2000000C505E6A562(3) 9 136.73GB 200MB ld9 ONLINE SEAGATE ST314680FSUN146G 0407
S/N 3HY61AQ200007411 WWNN 2000000C505EC2B42(3) 10 136.73GB 200MB ld10 ONLINE SEAGATE ST314680FSUN146G 0407
S/N 3HY61JP900007412 WWNN 2000000C505EB7122(3) 11 136.73GB 200MB ld11 ONLINE SEAGATE ST314680FSUN146G 0407
S/N 3HY61JZC00007412 WWNN 2000000C505EB9B2 sccli> [EMAIL PROTECTED]:/>zpool status pool: tank state: DEGRADEDstatus: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: resilver in progress, 11.56% done, 0h37m to go config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror DEGRADED 0 0 0 c2t40d0 ONLINE 0 0 0 spare DEGRADED 0 0 0 c2t40d1 UNAVAIL 0 0 0 cannot open c2t40d10 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t40d2 ONLINE 0 0 0 c2t40d3 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t40d4 ONLINE 0 0 0 c2t40d5 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t40d6 ONLINE 0 0 0 c2t40d7 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t40d8 ONLINE 0 0 0 c2t40d9 ONLINE 0 0 0 spares c2t40d10 INUSE currently in usec2t40d11 AVAIL
errors: No known data errors [EMAIL PROTECTED]:/> [EMAIL PROTECTED]:/>/usr/sbin/fmadm faulty--------------- ------------------------------------ -------------- --------- TIME EVENT-ID MSG-ID SEVERITY --------------- ------------------------------------ -------------- --------- Aug 27 13:05:22 ea01afff-c58e-6b32-e345-81da8bf43146 ZFS-8000-D3 Major
Fault class : fault.fs.zfs.deviceDescription : A ZFS device failed. Refer to http://sun.com/msg/ZFS-8000-D3 for
more information. Response : No automated response will occur. Impact : Fault tolerance of the pool may be compromised. Action : Run 'zpool status -x' and replace the bad device. [EMAIL PROTECTED]:/> ########################################################################################I have no idea what is going wrong here. Please give me some advices how to proceed. Or should I better make a call to service?
Thanks in advance, thomas -- Dr. Thomas Bleek, Netzwerkadministrator Helmholtz-Zentrum Potsdam Deutsches GeoForschungsZentrum Telegrafenberg G261 D-14473 Potsdam Tel.: +49 331 288- 1818/1681 Fax.: 1730 Mobil: +49 172 1543233 E-Mail: [EMAIL PROTECTED]
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss