Re: [zfs-discuss] ZFS Honesty after a power failure
On Tue, 24 Mar 2009, Dennis Clarke wrote: However, I have repeatedly run into problems when I need to boot after a power failure. I see vdevs being marked as FAULTED regardless if there are actually any hard errors reported by the on disk SMART Firmware. I am able to remove these FAULTed devices temporarily and then re-insert the same disk again and then run fine for months. Until the next long power failure. In spite of huge detail, you failed to describe to us the technology used to communicate with these disks. The interface adaptors, switches, and wiring topology could make a difference. Is there *really* a severe fault in that disk ? # luxadm -v display 2118625d599d This sounds some some sort of fiber channel. Transport protocol: IEEE 1394 (SBP-2) Interesting that it mentions the protocol used by FireWire. If you are using fiber channel, the device names in the pool specification suggest that Solaris multipathing is not being used (I would expect something long like c4t600A0B800039C9B50A9C47B4522Dd0). If multipathing is not used, then you either have simplex connectivity, or two competing simplex paths to each device. Multipathing is recommended if you have redundant paths available. If the disk itself is not aware of its severe faults then that suggests that there is a transient problem with communicating with the disk. The problem could be in a device driver, adaptor card, FC switch, or cable. If the disk drive also lost power, perhaps the disk is unusually slow at spinning up. It is easy to blame ZFS for problems. On my system I was experiencing system crashes overnight while running 'zfs scrub' via cron job. The fiber channel card was locking up. Eventually I learned that it was due to a bug in VirtualBox's device driver. If VirtualBox was not left running overnight, then the system would not crash. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Honesty after a power failure
On Tue, 24 Mar 2009, Dennis Clarke wrote: However, I have repeatedly run into problems when I need to boot after a power failure. I see vdevs being marked as FAULTED regardless if there are actually any hard errors reported by the on disk SMART Firmware. I am able to remove these FAULTed devices temporarily and then re-insert the same disk again and then run fine for months. Until the next long power failure. In spite of huge detail, you failed to describe to us the technology used to communicate with these disks. The interface adaptors, switches, and wiring topology could make a difference. Nothing fancy. Dual QLogic ( Sun ) fibre cards directly connected to the back of A5200's. Simple really. Is there *really* a severe fault in that disk ? # luxadm -v display 2118625d599d This sounds some some sort of fiber channel. Transport protocol: IEEE 1394 (SBP-2) Interesting that it mentions the protocol used by FireWire. I have no idea where that is coming from. If you are using fiber channel, the device names in the pool specification suggest that Solaris multipathing is not being used (I would expect something long like c4t600A0B800039C9B50A9C47B4522Dd0). If multipathing is not used, then you either have simplex connectivity, or two competing simplex paths to each device. Multipathing is recommended if you have redundant paths available. Yes, I have another machine that has mpxio in place. However a power failure also trips phantom faults. If the disk itself is not aware of its severe faults then that suggests that there is a transient problem with communicating with the disk. You would think so eh? But a transient problem that only occurs after a power failure? The problem could be in a device driver, adaptor card, FC switch, or cable. If the disk drive also lost power, perhaps the disk is unusually slow at spinning up. All disks were up at boot, you can see that when I ask for a zpool status at boot time in single user mode. No errors and no faults. The issue seems to be when fmadm starts up or perhaps some other service that can thrown a fault. I'm not sure. It is easy to blame ZFS for problems. It is easy to blame a power failure for problems as well as an nice shiney new APC Smart-UPS XL 3000VA RM 3U unit with external extended run time battery that doesn't signal a power failure. I never blame ZFS for anything. On my system I was experiencing system crashes overnight while running 'zfs scrub' via cron job. The fiber channel card was locking up. Eventually I learned that it was due to a bug in VirtualBox's device driver. If VirtualBox was not left running overnight, then the system would not crash. VirtualBox ? This is a Solaris 10 machine. Nothing fancy. OKay, sorry, nothing way out in the field fancy like VirtualBox. Dennis ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Honesty after a power failure
On Tue, 24 Mar 2009, Dennis Clarke wrote: You would think so eh? But a transient problem that only occurs after a power failure? Transient problems are most common after a power failure or during initialization. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Honesty after a power failure
On Tue, 24 Mar 2009, Dennis Clarke wrote: You would think so eh? But a transient problem that only occurs after a power failure? Transient problems are most common after a power failure or during initialization. Well the issue here is that power was on for ten minutes before I tried to do a boot from the ok pronpt. Regardless, the point is that the ZPool shows no faults at boot time and then shows phantom faults *after* I go to init 3. That does seem odd. Dennsi ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Honesty after a power failure
Dennis Clarke wrote: On Tue, 24 Mar 2009, Dennis Clarke wrote: However, I have repeatedly run into problems when I need to boot after a power failure. I see vdevs being marked as FAULTED regardless if there are actually any hard errors reported by the on disk SMART Firmware. I am able to remove these FAULTed devices temporarily and then re-insert the same disk again and then run fine for months. Until the next long power failure. In spite of huge detail, you failed to describe to us the technology used to communicate with these disks. The interface adaptors, switches, and wiring topology could make a difference. Nothing fancy. Dual QLogic ( Sun ) fibre cards directly connected to the back of A5200's. Simple really. Run away! Run away! Save yourself a ton of grief and replace the A5200. Is there *really* a severe fault in that disk ? # luxadm -v display 2118625d599d This sounds some some sort of fiber channel. Transport protocol: IEEE 1394 (SBP-2) Interesting that it mentions the protocol used by FireWire. I have no idea where that is coming from. If you are using fiber channel, the device names in the pool specification suggest that Solaris multipathing is not being used (I would expect something long like c4t600A0B800039C9B50A9C47B4522Dd0). If multipathing is not used, then you either have simplex connectivity, or two competing simplex paths to each device. Multipathing is recommended if you have redundant paths available. Yes, I have another machine that has mpxio in place. However a power failure also trips phantom faults. If the disk itself is not aware of its severe faults then that suggests that there is a transient problem with communicating with the disk. You would think so eh? But a transient problem that only occurs after a power failure? The problem could be in a device driver, adaptor card, FC switch, or cable. If the disk drive also lost power, perhaps the disk is unusually slow at spinning up. All disks were up at boot, you can see that when I ask for a zpool status at boot time in single user mode. No errors and no faults. The issue seems to be when fmadm starts up or perhaps some other service that can thrown a fault. I'm not sure. The following will help you diagnose where the error messages are generated from. I doubt it is a problem with the disk, per se, but you will want to double check your disk firmware to make sure it is up to date (I've got scars) fmadm faulty fmdump -eV -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Honesty after a power failure
Hey, Dennis - I can't help but wonder if the failure is a result of zfs itself finding some problems post restart... Is there anything in your FMA logs? fmstat for a summary and fmdump for a summary of the related errors eg: drteeth:/tmp # fmdump TIME UUID SUNW-MSG-ID Nov 03 13:57:29.4190 e28210d7-b7aa-42e0-a3e8-9ba21332d1c7 ZFS-8000-D3 Nov 03 13:57:29.9921 916ce3e2-0c5c-e335-d317-ba1e8a93742e ZFS-8000-D3 Nov 03 14:04:58.8973 ff2f60f8-2906-676a-bfb7-ccbd9c7f957d ZFS-8000-CS Mar 05 18:04:40.7116 ff2f60f8-2906-676a-bfb7-ccbd9c7f957d FMD-8000-4M Repaired Mar 05 18:04:40.7875 ff2f60f8-2906-676a-bfb7-ccbd9c7f957d FMD-8000-6U Resolved Mar 05 18:04:41.0052 e28210d7-b7aa-42e0-a3e8-9ba21332d1c7 FMD-8000-4M Repaired Mar 05 18:04:41.0760 e28210d7-b7aa-42e0-a3e8-9ba21332d1c7 FMD-8000-6U Resolved then for example, fmdump -vu e28210d7-b7aa-42e0-a3e8-9ba21332d1c7 and fmdump -Vvu e28210d7-b7aa-42e0-a3e8-9ba21332d1c7 will show more and more information about the error. Note that some of it might seem like rubbish. The important bits should be obvious though - things like the SUNW error message is (like ZFS-8000-D3), which can be pumped into sun.com/msg to see what exactly it's going on about. Note also that there should also be something interesting in the /var/adm/messages log to match and 'faulted' devices. You might also find an fmdump -e and fmdump -eV to be interesting - This is the *error* log as opposed to the *fault* log. (Every 'thing that goes wrong' is an error, only those that are diagnosed are considered a fault.) Note that in all of these fm[dump|stat] commands, you are really only looking at the two sets of data. The errors - that is the telemetry incoming to FMA - and the faults. If you include a -e, you view the errors, otherwise, you are looking at the faults. By the way - sun.com/msg has a great PDF on it about the predictive self healing technologies in Solaris 10 and will offer more interesting information. Would be interesting to see *why* ZFS / FMA is feeling the need to fault your devices. I was interested to see on one of my boxes that I have actually had a *lot* of errors, which I'm now going to have to investigate... Looks like I have a dud rocket in my system... :) Oh - And I saw this: Nov 03 14:04:31.2783 ereport.fs.zfs.checksum Score one more for ZFS! This box has a measly 300GB mirrored, and I have already seen dud data. (heh... It's also got non-ecc memory... ;) Cheers! Nathan. Dennis Clarke wrote: On Tue, 24 Mar 2009, Dennis Clarke wrote: You would think so eh? But a transient problem that only occurs after a power failure? Transient problems are most common after a power failure or during initialization. Well the issue here is that power was on for ten minutes before I tried to do a boot from the ok pronpt. Regardless, the point is that the ZPool shows no faults at boot time and then shows phantom faults *after* I go to init 3. That does seem odd. Dennsi ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- // // Nathan Kroenert nathan.kroen...@sun.com // // Systems Engineer Phone: +61 3 9869-6255 // // Sun Microsystems Fax:+61 3 9869-6288 // // Level 7, 476 St. Kilda Road Mobile: 0419 305 456// // Melbourne 3004 VictoriaAustralia // // ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Honesty after a power failure
Hey, Dennis - I can't help but wonder if the failure is a result of zfs itself finding some problems post restart... Yes, yes, this is what I am feeling also, but I need to find the data also and then I can sleep at night. I am certain that ZFS does not just toss out faults on a whim because there must be a deterministic, logical and code based reason for those faults that occur *after* I go to init 3. Is there anything in your FMA logs? Oh God yes, brace yourself :-) http://www.blastwave.org/dclarke/zfs/fmstat.txt [ I edit the whitespace here for clarity ] # fmstat module ev_recv ev_acpt wait svc_t %w %b open solve memsz bufsz cpumem-diagnosis 0 0 0.0 2.7 0 0 3 0 4.2K 1.1K cpumem-retire 0 0 0.0 0.2 0 0 0 0 0 0 disk-transport 0 0 0.0 45.7 0 0 0 040b 0 eft0 0 0.0 0.7 0 0 0 0 1.2M 0 fabric-xlate 0 0 0.0 0.7 0 0 0 0 0 0 fmd-self-diagnosis 3 0 0.0 0.2 0 0 0 0 0 0 io-retire 0 0 0.0 0.2 0 0 0 0 0 0 snmp-trapgen 2 0 0.0 1.7 0 0 0 032b 0 sysevent-transport 0 0 0.0 75.4 0 0 0 0 0 0 syslog-msgs2 0 0.0 1.4 0 0 0 0 0 0 zfs-diagnosis296 252 2.0 236719.7 98 0 1 2 176b 144b zfs-retire 4 0 0.0 27.4 0 0 0 0 0 0 zfs-diagnosis svc_t=236719.7 ? for a summary and fmdump for a summary of the related errors http://www.blastwave.org/dclarke/zfs/fmdump.txt # fmdump TIME UUID SUNW-MSG-ID Dec 05 21:31:46.1069 aa3bfcfa-3261-cde4-d381-dae8abf296de ZFS-8000-D3 Mar 07 08:46:43.6238 4c8b199b-add1-c3fe-c8d6-9deeff91d9de ZFS-8000-FD Mar 07 19:37:27.9819 b4824ce2-8f42-4392-c7bc-ab2e9d14b3b7 ZFS-8000-FD Mar 07 19:37:29.8712 af726218-f1dc-6447-f581-cc6bb1411aa4 ZFS-8000-FD Mar 07 19:37:30.2302 58c9e01f-8a80-61b0-ffea-ded63a9b076d ZFS-8000-FD Mar 07 19:37:31.6410 3b0bfd9d-fc39-e7c2-c8bd-879cad9e5149 ZFS-8000-FD Mar 10 19:37:08.8289 aa3bfcfa-3261-cde4-d381-dae8abf296de FMD-8000-4M Repaired Mar 23 23:47:36.9701 2b1aa4ae-60e4-c8ef-8eec-d92a18193e7a ZFS-8000-FD Mar 24 01:29:00.1981 3780a2dd-7381-c053-e186-8112b463c2b7 ZFS-8000-FD Mar 24 01:29:02.1649 146dad1d-f195-c2d6-c630-c1adcd58b288 ZFS-8000-FD # fmdump -vu 3780a2dd-7381-c053-e186-8112b463c2b7 TIME UUID SUNW-MSG-ID Mar 24 01:29:00.1981 3780a2dd-7381-c053-e186-8112b463c2b7 ZFS-8000-FD 100% fault.fs.zfs.vdev.io Problem in: zfs://pool=fibre0/vdev=444604062b426970 Affects: zfs://pool=fibre0/vdev=444604062b426970 FRU: - Location: - # fmdump -vu 146dad1d-f195-c2d6-c630-c1adcd58b288 TIME UUID SUNW-MSG-ID Mar 24 01:29:02.1649 146dad1d-f195-c2d6-c630-c1adcd58b288 ZFS-8000-FD 100% fault.fs.zfs.vdev.io Problem in: zfs://pool=fibre0/vdev=23e4d7426f941f52 Affects: zfs://pool=fibre0/vdev=23e4d7426f941f52 FRU: - Location: - will show more and more information about the error. Note that some of it might seem like rubbish. The important bits should be obvious though - things like the SUNW error message is (like ZFS-8000-D3), which can be pumped into sun.com/msg like so : http://www.sun.com/msg/ZFS-8000-FD or see http://www.blastwave.org/dclarke/zfs/ZFS-8000-FD.txt Article for Message ID: ZFS-8000-FD Too many I/O errors on ZFS device Type Fault Severity Major Description The number of I/O errors associated with a ZFS device exceeded acceptable levels. Automated Response The device has been offlined and marked as faulted. An attempt will be made to activate a hot spare if available. Impact The fault tolerance of the pool may be affected. Yep, I agree, that is what I saw. Note also that there should also be something interesting in the /var/adm/messages log to match and 'faulted' devices. You might also find an fmdump -e spooky long list of events : TIME CLASS Mar 23 23:47:28.5586 ereport.fs.zfs.io Mar 23 23:47:28.5594 ereport.fs.zfs.io Mar 23 23:47:28.5588 ereport.fs.zfs.io Mar 23 23:47:28.5592 ereport.fs.zfs.io Mar 23 23:47:28.5593 ereport.fs.zfs.io . . . Mar 23 23:47:28.5622 ereport.fs.zfs.io Mar 23 23:47:28.5560 ereport.fs.zfs.io Mar 23 23:47:28.5658 ereport.fs.zfs.io Mar 23 23:48:41.5957 ereport.fs.zfs.io http://www.blastwave.org/dclarke/zfs/fmdump_e.txt ouch, that is a nasty long list all in a few seconds. and fmdump -eV a very detailed verbose long list with such entries as Mar 23 2009 23:48:41.595757900 ereport.fs.zfs.io nvlist version: 0 class = ereport.fs.zfs.io ena =