To update everyone, I did a complete zfs scrub, and it it generated no errors in iostat, and I have 4.8T of data on the filesystem so it was a fairly lengthy test. The machine also has exhibited no evidence of instability. If I were to start copying a lot of data to the filesystem again though, I'm sure it would generate errors and crash again.
Chad On Tue, Dec 01, 2009 at 12:29:16AM -0800, Chad Cantwell wrote: > Well, ok, the msi=0 thing didn't help after all. A few minutes after my last > message a few errors showed > up in iostat, and then in a few minutes more the machine was locked up > hard... Maybe I will try just > doing a scrub instead of my rsync process and see how that does. > > Chad > > > On Tue, Dec 01, 2009 at 12:13:36AM -0800, Chad Cantwell wrote: > > I don't think the hardware has any problems, it only started having errors > > when I upgraded OpenSolaris. > > It's still working fine again now after a reboot. Actually, I reread one > > of your earlier messages, > > and I didn't realize at first when you said "non-Sun JBOD" that this didn't > > apply to me (in regards to > > the msi=0 fix) because I didn't realize JBOD was shorthand for an external > > expander device. Since > > I'm just using baremetal, and passive backplanes, I think the msi=0 fix > > should apply to me based on > > what you wrote earlier, anyway I've put > > set mpt:mpt_enable_msi = 0 > > now in /etc/system and rebooted as it was suggested earlier. I've resumed > > my rsync, and so far there > > have been no errors, but it's only been 20 minutes or so. I should have a > > good idea by tomorrow if this > > definitely fixed the problem (since even when the machine was not crashing > > it was tallying up iostat errors > > fairly rapidly) > > > > Thanks again for your help. Sorry for wasting your time if the previously > > posted workaround fixes things. > > I'll let you know tomorrow either way. > > > > Chad > > > > On Tue, Dec 01, 2009 at 05:57:28PM +1000, James C. McPherson wrote: > > > Chad Cantwell wrote: > > > >After another crash I checked the syslog and there were some different > > > >errors than the ones > > > >I saw previously during operation: > > > ... > > > > > > >Nov 30 20:59:13 the-vault LSI PCI device (1000,ffff) not supported. > > > ... > > > >Nov 30 20:59:13 the-vault mpt_config_space_init failed > > > ... > > > >Nov 30 20:59:15 the-vault mpt_restart_ioc failed > > > .... > > > > > > >Nov 30 21:33:02 the-vault fmd: [ID 377184 daemon.error] SUNW-MSG-ID: > > > >PCIEX-8000-8R, TYPE: Fault, VER: 1, SEVERITY: Major > > > >Nov 30 21:33:02 the-vault EVENT-TIME: Mon Nov 30 21:33:02 PST 2009 > > > >Nov 30 21:33:02 the-vault PLATFORM: System-Product-Name, CSN: > > > >System-Serial-Number, HOSTNAME: the-vault > > > >Nov 30 21:33:02 the-vault SOURCE: eft, REV: 1.16 > > > >Nov 30 21:33:02 the-vault EVENT-ID: 7886cc0d-4760-60b2-e06a-8158c3334f63 > > > >Nov 30 21:33:02 the-vault DESC: The transmitting device sent an invalid > > > >request. > > > >Nov 30 21:33:02 the-vault Refer to http://sun.com/msg/PCIEX-8000-8R > > > >for more information. > > > >Nov 30 21:33:02 the-vault AUTO-RESPONSE: One or more device instances > > > >may be disabled > > > >Nov 30 21:33:02 the-vault IMPACT: Loss of services provided by the > > > >device instances associated with this fault > > > >Nov 30 21:33:02 the-vault REC-ACTION: Ensure that the latest drivers and > > > >patches are installed. Otherwise schedule a repair procedure to replace > > > >the affected device(s). Us > > > >e fmadm faulty to identify the devices or contact Sun for support. > > > > > > > > > Sorry to have to tell you, but that HBA is dead. Or at > > > least dying horribly. If you can't init the config space > > > (that's the pci bus config space), then you've got about > > > 1/2 the nails in the coffin hammered in. Then the failure > > > to restart the IOC (io controller unit) == the rest of > > > the lid hammered down. > > > > > > > > > best regards, > > > James C. McPherson > > > -- > > > Senior Kernel Software Engineer, Solaris > > > Sun Microsystems > > > http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss@opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss