Erik Gulliksson wrote:
> Hi Victor,
> 
> Thanks for the prompt reply. Here are the results from your suggestions.
> 
>> Panic stack would be useful.
> I'm sorry I don't have this available and I don't want to cause another panic 
> :)

It should be saved in system messages on your Solaris 10 machine (unless 
  it's power was not removed  abruptly).

>> It is apparently blocked somewhere in kernel. Try to do something like this
>> from another window to get better idea:
>>
>> echo "<pid of zpool>::pid2proc|::walk thread|::findtsack -v" | mdb -k
>> echo "::threadlist -v" | mdb -k
>>
> window 1:
> bash-3.2# zpool import -f data1
> window 2:
> bash-3.2# ps -fel | grep zpool
>  0 S     root   897   874   0  40 20        ?   1262        ? 15:03:23
> pts/3       0:00 zpool import -f data1
> bash-3.2# echo "0t897::pid2proc|::walk thread|::findstack -v" | mdb -k
> stack pointer for thread ffffff01b57ab840: ffffff00083709f0
> [ ffffff00083709f0 _resume_from_idle+0xf1() ]
>   ffffff0008370a30 swtch+0x17f()
>   ffffff0008370a60 cv_wait+0x61(ffffff01b3bd71ca, ffffff01b3bd7188)
>   ffffff0008370ab0 txg_wait_synced+0x81(ffffff01b3bd7000, 299ee597)
>   ffffff0008370b10 spa_config_update_common+0x79(ffffff01b42a8a80, 0, 0)
>   ffffff0008370bb0 spa_import_common+0x36e(ffffff01b5ad4000, 
> ffffff01b5325310, 0
>   , 0, 0)
>   ffffff0008370be0 spa_import+0x1e(ffffff01b5ad4000, ffffff01b5325310, 0)
>   ffffff0008370c30 zfs_ioc_pool_import+0xad(ffffff01b5ad4000)
>   ffffff0008370cb0 zfsdev_ioctl+0x10d(b600000000, 5a02, 80424f0, 100003,
>   ffffff01b3f181a0, ffffff0008370e9c)
>   ffffff0008370cf0 cdev_ioctl+0x48(b600000000, 5a02, 80424f0, 100003,
>   ffffff01b3f181a0, ffffff0008370e9c)
>   ffffff0008370d30 spec_ioctl+0x86(ffffff01adf41d00, 5a02, 80424f0, 100003,
>   ffffff01b3f181a0, ffffff0008370e9c, 0)
>   ffffff0008370db0 fop_ioctl+0x7b(ffffff01adf41d00, 5a02, 80424f0, 100003,
>   ffffff01b3f181a0, ffffff0008370e9c, 0)
>   ffffff0008370ec0 ioctl+0x174(3, 5a02, 80424f0)
>   ffffff0008370f10 _sys_sysenter_post_swapgs+0x14b()
> 
> bash-3.2# echo "::threadlist -v" | mdb -k
> Output a bit too long to post here. Is there anything in particular i
> should look for in this output?

Well, since we are talking about ZFS any thread somewhere in ZFS module 
are interesting, and there should not be too many of them. Though in 
this case it is clear - it is trying to update config object and waits 
for the update to sync. There should be another thread with stack 
similar to this:

genunix:cv_wait()
zfs:zio_wait()
zfs:dbuf_read()
zfs:dmu_buf_will_dirty()
zfs:dmu_write()
zfs:spa_sync_nvlist()
zfs:spa_sync_config_object()
zfs:spa_sync()
zfs:txg_sync_thread()
unix:thread_start()

It wait due to checksum error detected while reading old config object 
from disk (call to dmu_read() above). It means that all ditto-blocks of 
config object got corrupted. On Solaris 10 there's no

>> Some useful information may be logged in FMA, try to see what is available
>> there with
>>
>> fmdump -eV
> 
> - I get a few ereport.fs.zfs.checksum reports like this
> bash-3.2# fmdump -eV
> Aug 22 2008 15:03:23.203687016 ereport.fs.zfs.checksum
> nvlist version: 0
>         class = ereport.fs.zfs.checksum
>         ena = 0x1a77b8287a00001
>         detector = (embedded nvlist)
>         nvlist version: 0
>                 version = 0x0
>                 scheme = zfs
>                 pool = 0xe2bba51ab8c26b53
>                 vdev = 0x79871af1e1f39de1
>         (end detector)
> 
>         pool = data1
>         pool_guid = 0xe2bba51ab8c26b53
>         pool_context = 0
>         pool_failmode = wait
>         vdev_guid = 0x79871af1e1f39de1
>         vdev_type = disk
>         vdev_path = /dev/dsk/c2t2d0s0
>         vdev_devid = id1,[EMAIL PROTECTED]/a
>         parent_guid = 0xe2bba51ab8c26b53
>         parent_type = root
>         zio_err = 50
>         zio_offset = 0x1e800416c00
>         zio_size = 0x400
>         zio_objset = 0x0
>         zio_object = 0xb
>         zio_level = 0
>         zio_blkid = 0x0
>         __ttl = 0x1
> 
> Aug 22 2008 15:03:23.203687247 ereport.fs.zfs.data
> nvlist version: 0
>         class = ereport.fs.zfs.data
>         ena = 0x1a77b8287a00001
>         detector = (embedded nvlist)
>         nvlist version: 0
>                 version = 0x0
>                 scheme = zfs
>                 pool = 0xe2bba51ab8c26b53
>         (end detector)
> 
>         pool = data1
>         pool_guid = 0xe2bba51ab8c26b53
>         pool_context = 0
>         pool_failmode = wait
>         zio_err = 50
>         zio_objset = 0x0
>         zio_object = 0xb
>         zio_level = 0
>         zio_blkid = 0x0
>         __ttl = 0x1
>         __tod = 0x48aeb91b 0xc24054f
> 
> Aug 22 2008 15:03:23.207225717 ereport.fs.zfs.io_failure
> nvlist version: 0
>         class = ereport.fs.zfs.io_failure
>         ena = 0x1a77ee27cb00001
>         detector = (embedded nvlist)
>         nvlist version: 0
>                 version = 0x0
>                 scheme = zfs
>                 pool = 0xe2bba51ab8c26b53
>         (end detector)
> 
>         pool = data1
>         pool_guid = 0xe2bba51ab8c26b53
>         pool_context = 0
>         pool_failmode = wait
>         __ttl = 0x1
>         __tod = 0x48aeb91b 0xc5a0375
> 
>> On Nevada you can try the following to  (same option repeated several times
>> increases verbosity):
>>
>> zdb -e -bb data1
>> zdb -e -dddd data1
>>
> 
> 
> bash-3.2# zdb -e -bb data1
> Traversing all blocks to verify nothing leaked ...
> out of memory -- generating core dump
> Abort
> 
> Seems i need to get a machine with more ram to do this :) This can be
> arranged on monday.

This is not needed now.

> bash-3.2# zdb -e -dddd data1
> Dataset mos [META], ID 0, cr_txg 4, 210M, 189 objects, rootbp [L0 DMU
> objset] 400L/200P DVA[0]=<0:56800000200:200> DVA[1]=<0:3000020200:200>
> DVA[2]=<0:48800001800:200> fletcher4 lzjb LE contiguous
> birth=698279317 fill=189
> cksum=89744d6d8:36e7cf71f81:b1d06b2acd36:1850b4cc5621f3
> 
>     Object  lvl   iblk   dblk  lsize  asize  type
>          0    2    16K    16K  96.0K  94.5K  DMU dnode
> 

<====== lots of output skipped ======>

> 
>     Object  lvl   iblk   dblk  lsize  asize  type
>         11    1    16K    16K    16K     2K  packed nvlist
>                                    8  bonus  packed nvlist size
> 
> Assertion failed: 0 == dmu_read(os, object, 0, nvsize, packed), file
> ../zdb.c, line 216, function dump_packed_nvlist

Here zdb crashes on dmu_read() of config object, since it expects it to 
return no error.

>> Btw, why does timestamp on your uberblock show July 1?
> Well, this is about the time when the crash happened. The clock on the
> server is correct.

Wow! Why did you wait almost two months?

Victor

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to