I recently upgraded a box to Solaris 10 U8.
I've been getting more timeouts and I guess the Adaptec card is suspect, possibly not 
able to keep up, so it issues bus resets at times.  It has apparently corrupted some 
files on the pool, and zpool status -v showed 2 files and one dataset corrupt.  I 
initially was able to bring the pool up and salvage some of the files.  It would not let 
me remove the files listed giving a "Bad exchange descriptor error".  So I 
figured I'd salvage and remove those two datasets and try again.

So while trying to salvage what I could, it apparently stressed the card too 
much (constantly at 98-100% busy), eventually the service times increased high 
enough and then failed with timeouts and another bus reset.

Then it crashed with the following:

panic[cpu2]/thread=c603adc0: assertion failed: 0 == dmu_buf_hold_array(os, object, 
offset, size, FALSE, FTAG, &numbufs, &dbp), file:
../../common/fs/zfs/dmu.c, line: 591

c603abec genunix:assfail+51 (edf9094c, edf90930,)
c603ac34 zfs:dmu_write+150 (c5aa3a20, 86, 0, b5)
c603ac9c zfs:space_map_sync+2ed (c6fde4cc, 1, c6fde3)
c603acec zfs:metaslab_sync+245 (c6fde340, 904f005, )
c603ad14 zfs:vdev_sync+a8 (c0bad040, 904f005, )
c603ad5c zfs:spa_sync+38e (c23196c0, 904f005, )
c603ada8 zfs:txg_sync_thread+22c (c1016600, 0)
c603adb8 unix:thread_start+8 ()

syncing file systems... [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] 
[1] [1] [1] [1] [1] [1] [1] done (not all i/o completed)
dumping to /dev/dsk/c0t0d0s1, offset 215547904, content: kernel
WARNING: This system contains a SCSI HBA card/driver that doesn't support 
software reset. This means that memory being used by the HBA for DMA based 
reads could have been updated after we panic'd.


And then it would not boot anymore.  It just went into a panic loop.  I hopped in the car 
and went to the data center.  I managed to boot off CD, mounted the root file system and 
moved /etc/zfs/zpool.cache out of the way, so now I can boot the OS again.  If I try to 
import the pool, I get the panic as above.  If you just enter "zpool import", I 
get the following:

 state: ONLINE
status: The pool is formatted using an older on-disk version.
action: The pool can be imported using its name or numeric identifier, though
        some features will not be available without an explicit 'zpool upgrade'.
config:

        pool0       ONLINE
          c2t4d0    ONLINE
          c2t4d2    ONLINE

So it appears to still be there, but I can't import it.
The two devices are actually hardware RAID devices of 750G each, so I don't 
have redundancy on the system level, only the hardware RAIDs.

I'm not too sure what to do with zdb to see anything.
Any ideas as to what I can do to recover the rest of the data?
There's still some database files on there I need.

Thanks,

Brian




_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to