Erik Gulliksson wrote: > Hi Victor, > > Thanks for the prompt reply. Here are the results from your suggestions. > >> Panic stack would be useful. > I'm sorry I don't have this available and I don't want to cause another panic > :)
It should be saved in system messages on your Solaris 10 machine (unless it's power was not removed abruptly). >> It is apparently blocked somewhere in kernel. Try to do something like this >> from another window to get better idea: >> >> echo "<pid of zpool>::pid2proc|::walk thread|::findtsack -v" | mdb -k >> echo "::threadlist -v" | mdb -k >> > window 1: > bash-3.2# zpool import -f data1 > window 2: > bash-3.2# ps -fel | grep zpool > 0 S root 897 874 0 40 20 ? 1262 ? 15:03:23 > pts/3 0:00 zpool import -f data1 > bash-3.2# echo "0t897::pid2proc|::walk thread|::findstack -v" | mdb -k > stack pointer for thread ffffff01b57ab840: ffffff00083709f0 > [ ffffff00083709f0 _resume_from_idle+0xf1() ] > ffffff0008370a30 swtch+0x17f() > ffffff0008370a60 cv_wait+0x61(ffffff01b3bd71ca, ffffff01b3bd7188) > ffffff0008370ab0 txg_wait_synced+0x81(ffffff01b3bd7000, 299ee597) > ffffff0008370b10 spa_config_update_common+0x79(ffffff01b42a8a80, 0, 0) > ffffff0008370bb0 spa_import_common+0x36e(ffffff01b5ad4000, > ffffff01b5325310, 0 > , 0, 0) > ffffff0008370be0 spa_import+0x1e(ffffff01b5ad4000, ffffff01b5325310, 0) > ffffff0008370c30 zfs_ioc_pool_import+0xad(ffffff01b5ad4000) > ffffff0008370cb0 zfsdev_ioctl+0x10d(b600000000, 5a02, 80424f0, 100003, > ffffff01b3f181a0, ffffff0008370e9c) > ffffff0008370cf0 cdev_ioctl+0x48(b600000000, 5a02, 80424f0, 100003, > ffffff01b3f181a0, ffffff0008370e9c) > ffffff0008370d30 spec_ioctl+0x86(ffffff01adf41d00, 5a02, 80424f0, 100003, > ffffff01b3f181a0, ffffff0008370e9c, 0) > ffffff0008370db0 fop_ioctl+0x7b(ffffff01adf41d00, 5a02, 80424f0, 100003, > ffffff01b3f181a0, ffffff0008370e9c, 0) > ffffff0008370ec0 ioctl+0x174(3, 5a02, 80424f0) > ffffff0008370f10 _sys_sysenter_post_swapgs+0x14b() > > bash-3.2# echo "::threadlist -v" | mdb -k > Output a bit too long to post here. Is there anything in particular i > should look for in this output? Well, since we are talking about ZFS any thread somewhere in ZFS module are interesting, and there should not be too many of them. Though in this case it is clear - it is trying to update config object and waits for the update to sync. There should be another thread with stack similar to this: genunix:cv_wait() zfs:zio_wait() zfs:dbuf_read() zfs:dmu_buf_will_dirty() zfs:dmu_write() zfs:spa_sync_nvlist() zfs:spa_sync_config_object() zfs:spa_sync() zfs:txg_sync_thread() unix:thread_start() It wait due to checksum error detected while reading old config object from disk (call to dmu_read() above). It means that all ditto-blocks of config object got corrupted. On Solaris 10 there's no >> Some useful information may be logged in FMA, try to see what is available >> there with >> >> fmdump -eV > > - I get a few ereport.fs.zfs.checksum reports like this > bash-3.2# fmdump -eV > Aug 22 2008 15:03:23.203687016 ereport.fs.zfs.checksum > nvlist version: 0 > class = ereport.fs.zfs.checksum > ena = 0x1a77b8287a00001 > detector = (embedded nvlist) > nvlist version: 0 > version = 0x0 > scheme = zfs > pool = 0xe2bba51ab8c26b53 > vdev = 0x79871af1e1f39de1 > (end detector) > > pool = data1 > pool_guid = 0xe2bba51ab8c26b53 > pool_context = 0 > pool_failmode = wait > vdev_guid = 0x79871af1e1f39de1 > vdev_type = disk > vdev_path = /dev/dsk/c2t2d0s0 > vdev_devid = id1,[EMAIL PROTECTED]/a > parent_guid = 0xe2bba51ab8c26b53 > parent_type = root > zio_err = 50 > zio_offset = 0x1e800416c00 > zio_size = 0x400 > zio_objset = 0x0 > zio_object = 0xb > zio_level = 0 > zio_blkid = 0x0 > __ttl = 0x1 > > Aug 22 2008 15:03:23.203687247 ereport.fs.zfs.data > nvlist version: 0 > class = ereport.fs.zfs.data > ena = 0x1a77b8287a00001 > detector = (embedded nvlist) > nvlist version: 0 > version = 0x0 > scheme = zfs > pool = 0xe2bba51ab8c26b53 > (end detector) > > pool = data1 > pool_guid = 0xe2bba51ab8c26b53 > pool_context = 0 > pool_failmode = wait > zio_err = 50 > zio_objset = 0x0 > zio_object = 0xb > zio_level = 0 > zio_blkid = 0x0 > __ttl = 0x1 > __tod = 0x48aeb91b 0xc24054f > > Aug 22 2008 15:03:23.207225717 ereport.fs.zfs.io_failure > nvlist version: 0 > class = ereport.fs.zfs.io_failure > ena = 0x1a77ee27cb00001 > detector = (embedded nvlist) > nvlist version: 0 > version = 0x0 > scheme = zfs > pool = 0xe2bba51ab8c26b53 > (end detector) > > pool = data1 > pool_guid = 0xe2bba51ab8c26b53 > pool_context = 0 > pool_failmode = wait > __ttl = 0x1 > __tod = 0x48aeb91b 0xc5a0375 > >> On Nevada you can try the following to (same option repeated several times >> increases verbosity): >> >> zdb -e -bb data1 >> zdb -e -dddd data1 >> > > > bash-3.2# zdb -e -bb data1 > Traversing all blocks to verify nothing leaked ... > out of memory -- generating core dump > Abort > > Seems i need to get a machine with more ram to do this :) This can be > arranged on monday. This is not needed now. > bash-3.2# zdb -e -dddd data1 > Dataset mos [META], ID 0, cr_txg 4, 210M, 189 objects, rootbp [L0 DMU > objset] 400L/200P DVA[0]=<0:56800000200:200> DVA[1]=<0:3000020200:200> > DVA[2]=<0:48800001800:200> fletcher4 lzjb LE contiguous > birth=698279317 fill=189 > cksum=89744d6d8:36e7cf71f81:b1d06b2acd36:1850b4cc5621f3 > > Object lvl iblk dblk lsize asize type > 0 2 16K 16K 96.0K 94.5K DMU dnode > <====== lots of output skipped ======> > > Object lvl iblk dblk lsize asize type > 11 1 16K 16K 16K 2K packed nvlist > 8 bonus packed nvlist size > > Assertion failed: 0 == dmu_read(os, object, 0, nvsize, packed), file > ../zdb.c, line 216, function dump_packed_nvlist Here zdb crashes on dmu_read() of config object, since it expects it to return no error. >> Btw, why does timestamp on your uberblock show July 1? > Well, this is about the time when the crash happened. The clock on the > server is correct. Wow! Why did you wait almost two months? Victor _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss