Hey, Dennis -
I can't help but wonder if the failure is a result of zfs itself finding
some problems post restart...
Is there anything in your FMA logs?
fmstat
for a summary and
fmdump
for a summary of the related errors
eg:
drteeth:/tmp # fmdump
TIME UUID SUNW-MSG-ID
Nov 03 13:57:29.4190 e28210d7-b7aa-42e0-a3e8-9ba21332d1c7 ZFS-8000-D3
Nov 03 13:57:29.9921 916ce3e2-0c5c-e335-d317-ba1e8a93742e ZFS-8000-D3
Nov 03 14:04:58.8973 ff2f60f8-2906-676a-bfb7-ccbd9c7f957d ZFS-8000-CS
Mar 05 18:04:40.7116 ff2f60f8-2906-676a-bfb7-ccbd9c7f957d FMD-8000-4M
Repaired
Mar 05 18:04:40.7875 ff2f60f8-2906-676a-bfb7-ccbd9c7f957d FMD-8000-6U
Resolved
Mar 05 18:04:41.0052 e28210d7-b7aa-42e0-a3e8-9ba21332d1c7 FMD-8000-4M
Repaired
Mar 05 18:04:41.0760 e28210d7-b7aa-42e0-a3e8-9ba21332d1c7 FMD-8000-6U
Resolved
then for example,
fmdump -vu e28210d7-b7aa-42e0-a3e8-9ba21332d1c7
and
fmdump -Vvu e28210d7-b7aa-42e0-a3e8-9ba21332d1c7
will show more and more information about the error. Note that some of
it might seem like rubbish. The important bits should be obvious though
- things like the SUNW error message is (like ZFS-8000-D3), which can be
pumped into
sun.com/msg
to see what exactly it's going on about.
Note also that there should also be something interesting in the
/var/adm/messages log to match and 'faulted' devices.
You might also find an
fmdump -e
and
fmdump -eV
to be interesting - This is the *error* log as opposed to the *fault*
log. (Every 'thing that goes wrong' is an error, only those that are
diagnosed are considered a fault.)
Note that in all of these fm[dump|stat] commands, you are really only
looking at the two sets of data. The errors - that is the telemetry
incoming to FMA - and the faults. If you include a -e, you view the
errors, otherwise, you are looking at the faults.
By the way - sun.com/msg has a great PDF on it about the predictive self
healing technologies in Solaris 10 and will offer more interesting
information.
Would be interesting to see *why* ZFS / FMA is feeling the need to fault
your devices.
I was interested to see on one of my boxes that I have actually had a
*lot* of errors, which I'm now going to have to investigate... Looks
like I have a dud rocket in my system... :)
Oh - And I saw this:
Nov 03 14:04:31.2783 ereport.fs.zfs.checksum
Score one more for ZFS! This box has a measly 300GB mirrored, and I have
already seen dud data. (heh... It's also got non-ecc memory... ;)
Cheers!
Nathan.
Dennis Clarke wrote:
On Tue, 24 Mar 2009, Dennis Clarke wrote:
You would think so eh?
But a transient problem that only occurs after a power failure?
Transient problems are most common after a power failure or during
initialization.
Well the issue here is that power was on for ten minutes before I tried
to do a boot from the ok pronpt.
Regardless, the point is that the ZPool shows no faults at boot time and
then shows phantom faults *after* I go to init 3.
That does seem odd.
Dennsi
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
--
//////////////////////////////////////////////////////////////////
// Nathan Kroenert nathan.kroen...@sun.com //
// Systems Engineer Phone: +61 3 9869-6255 //
// Sun Microsystems Fax: +61 3 9869-6288 //
// Level 7, 476 St. Kilda Road Mobile: 0419 305 456 //
// Melbourne 3004 Victoria Australia //
//////////////////////////////////////////////////////////////////
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss