To update everyone, I did a complete zfs scrub, and it it generated no errors 
in iostat, and I have 4.8T of
data on the filesystem so it was a fairly lengthy test.  The machine also has 
exhibited no evidence of
instability.  If I were to start copying a lot of data to the filesystem again 
though, I'm sure it would
generate errors and crash again.

Chad


On Tue, Dec 01, 2009 at 12:29:16AM -0800, Chad Cantwell wrote:
> Well, ok, the msi=0 thing didn't help after all.  A few minutes after my last 
> message a few errors showed
> up in iostat, and then in a few minutes more the machine was locked up 
> hard...  Maybe I will try just
> doing a scrub instead of my rsync process and see how that does.
> 
> Chad
> 
> 
> On Tue, Dec 01, 2009 at 12:13:36AM -0800, Chad Cantwell wrote:
> > I don't think the hardware has any problems, it only started having errors 
> > when I upgraded OpenSolaris.
> > It's still working fine again now after a reboot.  Actually, I reread one 
> > of your earlier messages,
> > and I didn't realize at first when you said "non-Sun JBOD" that this didn't 
> > apply to me (in regards to
> > the msi=0 fix) because I didn't realize JBOD was shorthand for an external 
> > expander device.  Since
> > I'm just using baremetal, and passive backplanes, I think the msi=0 fix 
> > should apply to me based on
> > what you wrote earlier, anyway I've put 
> >     set mpt:mpt_enable_msi = 0
> > now in /etc/system and rebooted as it was suggested earlier.  I've resumed 
> > my rsync, and so far there
> > have been no errors, but it's only been 20 minutes or so.  I should have a 
> > good idea by tomorrow if this
> > definitely fixed the problem (since even when the machine was not crashing 
> > it was tallying up iostat errors
> > fairly rapidly)
> > 
> > Thanks again for your help.  Sorry for wasting your time if the previously 
> > posted workaround fixes things.
> > I'll let you know tomorrow either way.
> > 
> > Chad
> > 
> > On Tue, Dec 01, 2009 at 05:57:28PM +1000, James C. McPherson wrote:
> > > Chad Cantwell wrote:
> > > >After another crash I checked the syslog and there were some different 
> > > >errors than the ones
> > > >I saw previously during operation:
> > > ...
> > > 
> > > >Nov 30 20:59:13 the-vault       LSI PCI device (1000,ffff) not supported.
> > > ...
> > > >Nov 30 20:59:13 the-vault       mpt_config_space_init failed
> > > ...
> > > >Nov 30 20:59:15 the-vault       mpt_restart_ioc failed
> > > ....
> > > 
> > > >Nov 30 21:33:02 the-vault fmd: [ID 377184 daemon.error] SUNW-MSG-ID: 
> > > >PCIEX-8000-8R, TYPE: Fault, VER: 1, SEVERITY: Major
> > > >Nov 30 21:33:02 the-vault EVENT-TIME: Mon Nov 30 21:33:02 PST 2009
> > > >Nov 30 21:33:02 the-vault PLATFORM: System-Product-Name, CSN: 
> > > >System-Serial-Number, HOSTNAME: the-vault
> > > >Nov 30 21:33:02 the-vault SOURCE: eft, REV: 1.16
> > > >Nov 30 21:33:02 the-vault EVENT-ID: 7886cc0d-4760-60b2-e06a-8158c3334f63
> > > >Nov 30 21:33:02 the-vault DESC: The transmitting device sent an invalid 
> > > >request.
> > > >Nov 30 21:33:02 the-vault   Refer to http://sun.com/msg/PCIEX-8000-8R 
> > > >for more information.
> > > >Nov 30 21:33:02 the-vault AUTO-RESPONSE: One or more device instances 
> > > >may be disabled
> > > >Nov 30 21:33:02 the-vault IMPACT: Loss of services provided by the 
> > > >device instances associated with this fault
> > > >Nov 30 21:33:02 the-vault REC-ACTION: Ensure that the latest drivers and 
> > > >patches are installed. Otherwise schedule a repair procedure to replace 
> > > >the affected device(s).  Us
> > > >e fmadm faulty to identify the devices or contact Sun for support.
> > > 
> > > 
> > > Sorry to have to tell you, but that HBA is dead. Or at
> > > least dying horribly. If you can't init the config space
> > > (that's the pci bus config space), then you've got about
> > > 1/2 the nails in the coffin hammered in. Then the failure
> > > to restart the IOC (io controller unit) == the rest of
> > > the lid hammered down.
> > > 
> > > 
> > > best regards,
> > > James C. McPherson
> > > --
> > > Senior Kernel Software Engineer, Solaris
> > > Sun Microsystems
> > > http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to