Re: [zfs-discuss] Intermittent ZFS hang

Charles J. Knipe Mon, 13 Sep 2010 09:29:04 -0700

> <div id="jive-html-wrapper-div">
> 
> Charles,<br>
> <br>
> Just like UNIX, there are several ways to drill down
> on the problem.&nbsp; I
> would probably start with a live crash dump (savecore
> -L) when you see
> the problem.&nbsp; Another method would be to grap
> multiple "stats" commands
> during the problem to see where you can drill down
> later.&nbsp; I would
> probably use this method if the problem lasts for a
> while and drill
> down with dtrace base on what I saw.&nbsp; But each
> method is going to
> depend on your skill, when looking at the
> problem.<br>
> <br>
> Dave<br>
> <br>


Dave,<br>
<br>
After running clean since my last post the problem occurred again today.  This 
time I was able to gather some data while it was going on.  The only thing that 
jumps out at my so far is the output of echo ::zio_state | mdb -k.
<br>
Under normal operations this usually looks like this:<br>
<br>
ADDRESS                                  TYPE  STAGE            WAITER<br>
<br>
ffffff090eb69328                         NULL  OPEN             -<br>
ffffff090eb69c88                         NULL  OPEN             -<br>
<br>
Here are a couple samples while the issue was happening:<br>
<br>
ADDRESS                                  TYPE  STAGE            WAITER<br>
<br>
ffffff0bfe8c59b0                         NULL  CHECKSUM_VERIFY  
ffffff003e2f2c60<br>
ffffff090eb69328                         NULL  OPEN             -<br>
ffffff090eb69c88                         NULL  OPEN             -<br>
<br>
ADDRESS                                  TYPE  STAGE            WAITER<br>
<br>
ffffff09bb12a040                         NULL  CHECKSUM_VERIFY  
ffffff003d6acc60<br>
ffffff0bfe8c59b0                         NULL  CHECKSUM_VERIFY  
ffffff003e2f2c60<br>
ffffff090eb69328                         NULL  OPEN             -<br>
ffffff090eb69c88                         NULL  OPEN             -<br>
<br>
Operating under the assumption that the waiter column is referencing kernel 
threads, I went looking for those addresses in the thread list.  Here are the 
threadlist entries for ffffff003d6acc60 and ffffff003e2f2c60 from the example 
directly above taken at about the same time as that output:<br>
<br>
ffffff003d6acc60 ffffff0930d8c700 ffffff09172f9de0   2   0 ffffff09bb12a348<br>
  PC: _resume_from_idle+0xf1    CMD: zpool-pool0<br>
  stack pointer for thread ffffff003d6acc60: ffffff003d6ac360<br>
  [ ffffff003d6ac360 _resume_from_idle+0xf1() ]<br>
    swtch+0x145()<br>
    cv_wait+0x61()<br>
    zio_wait+0x5d()<br>
    dbuf_read+0x1e8()<br>
    dmu_buf_hold+0x93()<br>
    zap_get_leaf_byblk+0x56()<br>
    zap_deref_leaf+0x78()<br>
    fzap_length+0x42()<br>
    zap_length_uint64+0x84()<br>
    ddt_zap_lookup+0x4b()<br>
    ddt_object_lookup+0x6d()<br>
    ddt_lookup+0x115()<br>
    zio_ddt_free+0x42()<br>
    zio_execute+0x8d()<br>
    taskq_thread+0x248()<br>
    thread_start+8()<br>
<br>
ffffff003e2f2c60 fffffffffbc2dbb0                0   0  60 ffffff0bfe8c5cb8<br>
  PC: _resume_from_idle+0xf1    THREAD: txg_sync_thread()<br>
  stack pointer for thread ffffff003e2f2c60: ffffff003e2f2a40<br>
  [ ffffff003e2f2a40 _resume_from_idle+0xf1() ]<br>
    swtch+0x145()<br>
    cv_wait+0x61()<br>
    zio_wait+0x5d()<br>
    spa_sync+0x40c()<br>
    txg_sync_thread+0x24a()<br>
    thread_start+8()<br>
<br>
Not sure if any of that sheds any light on the problem.  I also have a live 
dump from the period when the problem was happening, a bunch of iostats, 
mpstats, and ::arc, ::spa, ::zio_state, and ::threadlist -v from mdb -k at 
several points during the issue.<br>
<br>
If you have any advice on how to proceed from here in debugging this issue I'd 
greatly appreciate it.  So you know, I'm generally very comfortable with unix, 
but dtrace and the solaris kernel are unfamiliar territory. <br> 
<br>
In any event, thanks again for all the help thus far.<br>
<br>
-Charles
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Intermittent ZFS hang

Reply via email to