[zfs-code] Recent deadlock.

2007-11-05 Thread Pawel Jakub Dawidek
Hi.

I'm observing the following deadlock.

One thread holds zfsvfs->z_hold_mtx[i] lock and waits for I/O:

Tracing pid 1188 tid 100114 td 0xc41b1660
sched_switch(c41b1660,0,1,180,9bbb0333,...) at sched_switch+0x1b6
mi_switch(1,0,c0666ae9,1bf,c41b1660,...) at mi_switch+0x1ee
sleepq_switch(c41b1660,0,c0666ae9,21d,c41b1660,...) at sleepq_switch+0xf0
sleepq_wait(de465f94,de465f7c,c43fc60a,1,0,...) at sleepq_wait+0x60
_cv_wait(de465f94,de465f7c,c43fc4cd,34f,0,...) at _cv_wait+0x1fc
zio_wait(de465d68,c43ea3c5,240,4000,0,...) at zio_wait+0x99
dbuf_read(d3a39594,de465d68,2,c43ee6df,d2e09900,...) at dbuf_read+0x2e7
dnode_hold_impl(c4bbf000,a075,0,1,c43ec318,...) at dnode_hold_impl+0x15a
dnode_hold(c4bbf000,a075,0,c43ec318,f899a758,...) at dnode_hold+0x35
dmu_bonus_hold(c480e540,a075,0,0,f899a7a0,...) at dmu_bonus_hold+0x31
zfs_zget(c3f4f000,a075,0,f899a89c,1,...) at zfs_zget+0x7a
zfs_dirent_lock(f899a8a0,c4d18c64,f899a928,f899a89c,6,...) at 
zfs_dirent_lock+0x619
zfs_dirlook(c4d18c64,f899a928,f899ac20,0,0,...) at zfs_dirlook+0x272
zfs_lookup(cbef4414,f899a928,f899ac20,f899ac34,2,...) at zfs_lookup+0x2df
zfs_freebsd_lookup(f899aa70,c0685c24,cbef4414,cbef4414,f899ac34,...) at 
zfs_freebsd_lookup+0xdd
VOP_CACHEDLOOKUP_APV(c4405800,f899aa70,f899ac34,f899ac20,c3f65200,...) at 
VOP_CACHEDLOOKUP_APV+0xc5
vfs_cache_lookup(f899aaf4,c066e847,cbef4414,1,cbef4414,...) at 
vfs_cache_lookup+0xd3
VOP_LOOKUP_APV(c4405800,f899aaf4,c41b1660,c066d7b8,19e,...) at 
VOP_LOOKUP_APV+0xe5
lookup(f899ac0c,c066d7b8,c9,bf,ca5b032c,...) at lookup+0x53e
namei(f899ac0c,0,c0679402,8d6,c065f266,...) at namei+0x2fe
kern_rename(c41b1660,2820d040,2820c040,0,f899ad2c,...) at kern_rename+0x4a
rename(c41b1660,f899acfc,8,c06758bf,c06924e0,...) at rename+0x29
syscall(f899ad38) at syscall+0x283
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (128, FreeBSD ELF32, rename), eip = 0x280d1cdb, esp = 0xbfbfe7fc, 
ebp = 0xbfbfea48 ---

Another thread tries to finish I/O, but can't, because it is trying to
acquire zfsvfs->z_hold_mtx[i] lock:

Tracing command spa_zio_intr_2 pid 1117 tid 100020 td 0xc3b38440
sched_switch(c3b38440,0,1,180,9d2b0e9f,...) at sched_switch+0x1b6
mi_switch(1,0,c0666ae9,1bf,c3b38440,...) at mi_switch+0x1ee
sleepq_switch(c3b38440,0,c0666ae9,21d,c3f4f6a4,...) at sleepq_switch+0xf0
sleepq_wait(c3f4f6a4,0,c43fba6f,3,0,...) at sleepq_wait+0x60
_sx_xlock_hard(c3f4f6a4,c3b38440,0,c43fb80c,31e,...) at _sx_xlock_hard+0x286
_sx_xlock(c3f4f6a4,0,c43fb80c,31e,c7af015c,...) at _sx_xlock+0xb8
zfs_zinactive(d0d00988,0,c4401326,e27,d0d00988,...) at zfs_zinactive+0xa2
zfs_inactive(c7af015c,c3aa9600,0,c7af015c,f8aa2888,...) at zfs_inactive+0x307
zfs_freebsd_inactive(f8aa28a0,c06856cc,c7af0230,c7af0230,c7af015c,...) at 
zfs_freebsd_inactive+0x32
VOP_INACTIVE_APV(c4405800,f8aa28a0,c066e0c7,8e9,c06b5120,...) at 
VOP_INACTIVE_APV+0xc5
vinactive(c7af0230,0,c066e0c7,859,858,...) at vinactive+0xb1
vrele(c7af015c,c71317d0,f8aa2924,c04d16b0,c71317d0,...) at vrele+0x18f
zfs_get_done(c5832d8c,c71317d0,292,28d,d2387b70,...) at zfs_get_done+0xad
dmu_sync_done(d11e56b4,ccc323c0,c6940930,0,0,...) at dmu_sync_done+0x1f0
arc_write_done(d11e56b4,d4863b2c,e38a2208,4d3,d11d68f0,...) at 
arc_write_done+0x44a
zio_done(d11e56b4,c44033e0,f8aa2a38,c434906a,d13c9e00,...) at zio_done+0xb2
zio_next_stage(d11e56b4,d11e56f8,80,0,f8aa2a68,...) at zio_next_stage+0x236
zio_assess(d11e56b4,c3b384d8,c069938c,c43fc4cd,d11e58c8,...) at zio_assess+0x843
zio_next_stage(d11e56b4,c43fc4cd,36d,36a,f8aa2b44,...) at zio_next_stage+0x236
zio_wait_for_children(d11e56b4,12,d11e58bc,f8aa2b8c,c436cc76,...) at 
zio_wait_for_children+0x99
zio_wait_children_done(d11e56b4,c0a575a0,c3b384d8,c0697e04,c0679402,...) at 
zio_wait_children_done+0x25
zio_next_stage(d11e56b4,c1074808,0,c0679402,8d6,...) at zio_next_stage+0x236
zio_vdev_io_assess(d11e56b4,c44033e0,40,cef4def8,c3,...) at 
zio_vdev_io_assess+0x2a7
zio_next_stage(d11e56b4,c3b38440,f8aa2c3c,246,c0a116b4,...) at 
zio_next_stage+0x236
vdev_mirror_io_done(d11e56b4,f8aa2cf8,c42c94de,d11e56b4,0,...) at 
vdev_mirror_io_done+0x113
zio_vdev_io_done(d11e56b4,0,c43e6f2d,33d,c442ec14,...) at zio_vdev_io_done+0x21
taskq_thread(c442ebf4,f8aa2d38,c065fcb9,31e,c3b37aa0,...) at taskq_thread+0x29e
fork_exit(c42c9240,c442ebf4,f8aa2d38) at fork_exit+0xb8
fork_trampoline() at fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xf8aa2d70, ebp = 0 ---

As you can see, the SPA ZIO INTR thread wants to finish I/O requests, calls
arc_write_done(). All in all we call vn_rele(), which may end up in calling
zfs_zinactive() if this was the last vnode reference. zfs_zinactive() tries
to acquire zfsvfs->z_hold_mtx[i] lock, but can't, because it is already held
by the thread waiting for I/O. Neither I/O can be finished, nor lock can be
unlocked.

I started to see this deadlock after last integration, so it was introduced
in more or less during last month. Any ideas how to solve it?

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
pjd at FreeBSD.org 

[zfs-code] Recent deadlock.

2007-11-07 Thread Pawel Jakub Dawidek
On Mon, Nov 05, 2007 at 02:22:12PM +0100, Pawel Jakub Dawidek wrote:
> Hi.
> 
> I'm observing the following deadlock.
> 
> One thread holds zfsvfs->z_hold_mtx[i] lock and waits for I/O:

[...]

> Another thread tries to finish I/O, but can't, because it is trying to
> acquire zfsvfs->z_hold_mtx[i] lock:

[...]

> As you can see, the SPA ZIO INTR thread wants to finish I/O requests, calls
> arc_write_done(). All in all we call vn_rele(), which may end up in calling
> zfs_zinactive() if this was the last vnode reference. zfs_zinactive() tries
> to acquire zfsvfs->z_hold_mtx[i] lock, but can't, because it is already held
> by the thread waiting for I/O. Neither I/O can be finished, nor lock can be
> unlocked.
> 
> I started to see this deadlock after last integration, so it was introduced
> in more or less during last month. Any ideas how to solve it?

I was able to trigger this deadlock with older code as well.

It seems I fixed it be changing the zfs_get_done() function from:

void
zfs_get_done(dmu_buf_t *db, void *vzgd)
{
zgd_t *zgd = (zgd_t *)vzgd;
rl_t *rl = zgd->zgd_rl;
vnode_t *vp = ZTOV(rl->r_zp);

dmu_buf_rele(db, vzgd);
zfs_range_unlock(rl);
VN_RELE(vp);
zil_add_vdev(zgd->zgd_zilog, DVA_GET_VDEV(BP_IDENTITY(zgd->zgd_bp)));
kmem_free(zgd, sizeof (zgd_t));
}

To something like this:

void

 
zfs_get_done(dmu_buf_t *db, void *vzgd) 

 
{   

 
zgd_t *zgd = (zgd_t *)vzgd; 

  
rl_t *rl = zgd->zgd_rl; 

  
vnode_t *vp = ZTOV(rl->r_zp);   




 
dmu_buf_rele(db, vzgd); 

  
zfs_range_unlock(rl);   


zgd->zgd_vnode = vp;

 
zil_add_vdev(zgd->zgd_zilog, DVA_GET_VDEV(BP_IDENTITY(zgd->zgd_bp)));   


/*
 * Put the vnode onto the list. Call VN_RELE() on this vnode from
 * within another kernel thread later (also free zgd there).
 */
mtx_lock(&zgd_done_mtx);

 
TAILQ_INSERT_TAIL(&zgd_done, zgd, zgd_next);

 
mtx_unlock(&zgd_done_mtx);  

   
wakeup(&zgd_done);  

   
}

Which bascially means, that VN_RELE() will be called from another thread
later. This seems to work, but I changed the order of VN_RELE() and
zil_add_vdev() calls. Is that safe? Doing zil_add_vdev() also from this
additional thread didn't worked for me.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
pjd at FreeBSD.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL: 



[zfs-code] Recent deadlock.

2007-11-07 Thread Mark Maybee
Pawel,

I'm not quite sure I understand why thread #1 below is stalled.  Is
there only a single thread available for IO completion?

-Mark

Pawel Jakub Dawidek wrote:
> Hi.
> 
> I'm observing the following deadlock.
> 
> One thread holds zfsvfs->z_hold_mtx[i] lock and waits for I/O:
> 
> Tracing pid 1188 tid 100114 td 0xc41b1660
> sched_switch(c41b1660,0,1,180,9bbb0333,...) at sched_switch+0x1b6
> mi_switch(1,0,c0666ae9,1bf,c41b1660,...) at mi_switch+0x1ee
> sleepq_switch(c41b1660,0,c0666ae9,21d,c41b1660,...) at sleepq_switch+0xf0
> sleepq_wait(de465f94,de465f7c,c43fc60a,1,0,...) at sleepq_wait+0x60
> _cv_wait(de465f94,de465f7c,c43fc4cd,34f,0,...) at _cv_wait+0x1fc
> zio_wait(de465d68,c43ea3c5,240,4000,0,...) at zio_wait+0x99
> dbuf_read(d3a39594,de465d68,2,c43ee6df,d2e09900,...) at dbuf_read+0x2e7
> dnode_hold_impl(c4bbf000,a075,0,1,c43ec318,...) at dnode_hold_impl+0x15a
> dnode_hold(c4bbf000,a075,0,c43ec318,f899a758,...) at dnode_hold+0x35
> dmu_bonus_hold(c480e540,a075,0,0,f899a7a0,...) at dmu_bonus_hold+0x31
> zfs_zget(c3f4f000,a075,0,f899a89c,1,...) at zfs_zget+0x7a
> zfs_dirent_lock(f899a8a0,c4d18c64,f899a928,f899a89c,6,...) at 
> zfs_dirent_lock+0x619
> zfs_dirlook(c4d18c64,f899a928,f899ac20,0,0,...) at zfs_dirlook+0x272
> zfs_lookup(cbef4414,f899a928,f899ac20,f899ac34,2,...) at zfs_lookup+0x2df
> zfs_freebsd_lookup(f899aa70,c0685c24,cbef4414,cbef4414,f899ac34,...) at 
> zfs_freebsd_lookup+0xdd
> VOP_CACHEDLOOKUP_APV(c4405800,f899aa70,f899ac34,f899ac20,c3f65200,...) at 
> VOP_CACHEDLOOKUP_APV+0xc5
> vfs_cache_lookup(f899aaf4,c066e847,cbef4414,1,cbef4414,...) at 
> vfs_cache_lookup+0xd3
> VOP_LOOKUP_APV(c4405800,f899aaf4,c41b1660,c066d7b8,19e,...) at 
> VOP_LOOKUP_APV+0xe5
> lookup(f899ac0c,c066d7b8,c9,bf,ca5b032c,...) at lookup+0x53e
> namei(f899ac0c,0,c0679402,8d6,c065f266,...) at namei+0x2fe
> kern_rename(c41b1660,2820d040,2820c040,0,f899ad2c,...) at kern_rename+0x4a
> rename(c41b1660,f899acfc,8,c06758bf,c06924e0,...) at rename+0x29
> syscall(f899ad38) at syscall+0x283
> Xint0x80_syscall() at Xint0x80_syscall+0x20
> --- syscall (128, FreeBSD ELF32, rename), eip = 0x280d1cdb, esp = 0xbfbfe7fc, 
> ebp = 0xbfbfea48 ---
> 
> Another thread tries to finish I/O, but can't, because it is trying to
> acquire zfsvfs->z_hold_mtx[i] lock:
> 
> Tracing command spa_zio_intr_2 pid 1117 tid 100020 td 0xc3b38440
> sched_switch(c3b38440,0,1,180,9d2b0e9f,...) at sched_switch+0x1b6
> mi_switch(1,0,c0666ae9,1bf,c3b38440,...) at mi_switch+0x1ee
> sleepq_switch(c3b38440,0,c0666ae9,21d,c3f4f6a4,...) at sleepq_switch+0xf0
> sleepq_wait(c3f4f6a4,0,c43fba6f,3,0,...) at sleepq_wait+0x60
> _sx_xlock_hard(c3f4f6a4,c3b38440,0,c43fb80c,31e,...) at _sx_xlock_hard+0x286
> _sx_xlock(c3f4f6a4,0,c43fb80c,31e,c7af015c,...) at _sx_xlock+0xb8
> zfs_zinactive(d0d00988,0,c4401326,e27,d0d00988,...) at zfs_zinactive+0xa2
> zfs_inactive(c7af015c,c3aa9600,0,c7af015c,f8aa2888,...) at zfs_inactive+0x307
> zfs_freebsd_inactive(f8aa28a0,c06856cc,c7af0230,c7af0230,c7af015c,...) at 
> zfs_freebsd_inactive+0x32
> VOP_INACTIVE_APV(c4405800,f8aa28a0,c066e0c7,8e9,c06b5120,...) at 
> VOP_INACTIVE_APV+0xc5
> vinactive(c7af0230,0,c066e0c7,859,858,...) at vinactive+0xb1
> vrele(c7af015c,c71317d0,f8aa2924,c04d16b0,c71317d0,...) at vrele+0x18f
> zfs_get_done(c5832d8c,c71317d0,292,28d,d2387b70,...) at zfs_get_done+0xad
> dmu_sync_done(d11e56b4,ccc323c0,c6940930,0,0,...) at dmu_sync_done+0x1f0
> arc_write_done(d11e56b4,d4863b2c,e38a2208,4d3,d11d68f0,...) at 
> arc_write_done+0x44a
> zio_done(d11e56b4,c44033e0,f8aa2a38,c434906a,d13c9e00,...) at zio_done+0xb2
> zio_next_stage(d11e56b4,d11e56f8,80,0,f8aa2a68,...) at zio_next_stage+0x236
> zio_assess(d11e56b4,c3b384d8,c069938c,c43fc4cd,d11e58c8,...) at 
> zio_assess+0x843
> zio_next_stage(d11e56b4,c43fc4cd,36d,36a,f8aa2b44,...) at zio_next_stage+0x236
> zio_wait_for_children(d11e56b4,12,d11e58bc,f8aa2b8c,c436cc76,...) at 
> zio_wait_for_children+0x99
> zio_wait_children_done(d11e56b4,c0a575a0,c3b384d8,c0697e04,c0679402,...) at 
> zio_wait_children_done+0x25
> zio_next_stage(d11e56b4,c1074808,0,c0679402,8d6,...) at zio_next_stage+0x236
> zio_vdev_io_assess(d11e56b4,c44033e0,40,cef4def8,c3,...) at 
> zio_vdev_io_assess+0x2a7
> zio_next_stage(d11e56b4,c3b38440,f8aa2c3c,246,c0a116b4,...) at 
> zio_next_stage+0x236
> vdev_mirror_io_done(d11e56b4,f8aa2cf8,c42c94de,d11e56b4,0,...) at 
> vdev_mirror_io_done+0x113
> zio_vdev_io_done(d11e56b4,0,c43e6f2d,33d,c442ec14,...) at 
> zio_vdev_io_done+0x21
> taskq_thread(c442ebf4,f8aa2d38,c065fcb9,31e,c3b37aa0,...) at 
> taskq_thread+0x29e
> fork_exit(c42c9240,c442ebf4,f8aa2d38) at fork_exit+0xb8
> fork_trampoline() at fork_trampoline+0x8
> --- trap 0, eip = 0, esp = 0xf8aa2d70, ebp = 0 ---
> 
> As you can see, the SPA ZIO INTR thread wants to finish I/O requests, calls
> arc_write_done(). All in all we call vn_rele(), which may end up in calling
> zfs_zinactive() if this was the last vnode reference. zfs_zinactive() tries
> to acquire zfsvfs->z_hold_mtx[i] lock, but can't, becaus

[zfs-code] Recent deadlock.

2007-11-07 Thread Pawel Jakub Dawidek
On Wed, Nov 07, 2007 at 07:01:52AM -0700, Mark Maybee wrote:
> Pawel,
> 
> I'm not quite sure I understand why thread #1 below is stalled.  Is
> there only a single thread available for IO completion?

There are few, but I belive the thread #2 is trying to complete the very
I/O request on which thread #1 is waiting.
Thread #2 can't complete this I/O request, because the lock it needs to
acquire is held by thread #1.

> > One thread holds zfsvfs->z_hold_mtx[i] lock and waits for I/O:
> > 
> > Tracing pid 1188 tid 100114 td 0xc41b1660
[...]
> > _cv_wait(de465f94,de465f7c,c43fc4cd,34f,0,...) at _cv_wait+0x1fc
> > zio_wait(de465d68,c43ea3c5,240,4000,0,...) at zio_wait+0x99
> > dbuf_read(d3a39594,de465d68,2,c43ee6df,d2e09900,...) at dbuf_read+0x2e7
> > dnode_hold_impl(c4bbf000,a075,0,1,c43ec318,...) at dnode_hold_impl+0x15a
> > dnode_hold(c4bbf000,a075,0,c43ec318,f899a758,...) at dnode_hold+0x35
> > dmu_bonus_hold(c480e540,a075,0,0,f899a7a0,...) at dmu_bonus_hold+0x31
> > zfs_zget(c3f4f000,a075,0,f899a89c,1,...) at zfs_zget+0x7a
> > zfs_dirent_lock(f899a8a0,c4d18c64,f899a928,f899a89c,6,...) at 
> > zfs_dirent_lock+0x619
> > zfs_dirlook(c4d18c64,f899a928,f899ac20,0,0,...) at zfs_dirlook+0x272
> > zfs_lookup(cbef4414,f899a928,f899ac20,f899ac34,2,...) at zfs_lookup+0x2df
[...]

> > Another thread tries to finish I/O, but can't, because it is trying to
> > acquire zfsvfs->z_hold_mtx[i] lock:
> > 
> > Tracing command spa_zio_intr_2 pid 1117 tid 100020 td 0xc3b38440
[...]
> > _sx_xlock(c3f4f6a4,0,c43fb80c,31e,c7af015c,...) at _sx_xlock+0xb8
> > zfs_zinactive(d0d00988,0,c4401326,e27,d0d00988,...) at zfs_zinactive+0xa2
> > zfs_inactive(c7af015c,c3aa9600,0,c7af015c,f8aa2888,...) at 
> > zfs_inactive+0x307
[...]
> > zfs_get_done(c5832d8c,c71317d0,292,28d,d2387b70,...) at zfs_get_done+0xad
> > dmu_sync_done(d11e56b4,ccc323c0,c6940930,0,0,...) at dmu_sync_done+0x1f0
> > arc_write_done(d11e56b4,d4863b2c,e38a2208,4d3,d11d68f0,...) at 
> > arc_write_done+0x44a
> > zio_done(d11e56b4,c44033e0,f8aa2a38,c434906a,d13c9e00,...) at zio_done+0xb2
> > zio_next_stage(d11e56b4,d11e56f8,80,0,f8aa2a68,...) at zio_next_stage+0x236
> > zio_assess(d11e56b4,c3b384d8,c069938c,c43fc4cd,d11e58c8,...) at 
> > zio_assess+0x843
> > zio_next_stage(d11e56b4,c43fc4cd,36d,36a,f8aa2b44,...) at 
> > zio_next_stage+0x236
> > zio_wait_for_children(d11e56b4,12,d11e58bc,f8aa2b8c,c436cc76,...) at 
> > zio_wait_for_children+0x99
> > zio_wait_children_done(d11e56b4,c0a575a0,c3b384d8,c0697e04,c0679402,...) at 
> > zio_wait_children_done+0x25
> > zio_next_stage(d11e56b4,c1074808,0,c0679402,8d6,...) at zio_next_stage+0x236
> > zio_vdev_io_assess(d11e56b4,c44033e0,40,cef4def8,c3,...) at 
> > zio_vdev_io_assess+0x2a7
> > zio_next_stage(d11e56b4,c3b38440,f8aa2c3c,246,c0a116b4,...) at 
> > zio_next_stage+0x236
> > vdev_mirror_io_done(d11e56b4,f8aa2cf8,c42c94de,d11e56b4,0,...) at 
> > vdev_mirror_io_done+0x113
> > zio_vdev_io_done(d11e56b4,0,c43e6f2d,33d,c442ec14,...) at 
> > zio_vdev_io_done+0x21
[...]

BTW. Why I always see vdev_mirror_io_done() even if I don't use mirror?
Here I had RAIDZ vdev.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
pjd at FreeBSD.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL: 



[zfs-code] Recent deadlock.

2007-11-07 Thread Mark Maybee
Hmm, seems rather unlikely that these two IOs are related.  Thread 1
is trying to read a dnode in order to extract the znode data from its
bonus buffer.  Thread 2 is completing a dmu_sync() write (so this is
the result of a zil operation).  While its possible that the dmu_sync()
write may involve reading some of the blocks in the dnode from Thread 1,
this should not result in Thread 1 waiting for anything.

I think vdev_mirror_io_done() often shows up in the stack because the
ditto-ing code leverages that code path.

-Mark

Pawel Jakub Dawidek wrote:
> On Wed, Nov 07, 2007 at 07:01:52AM -0700, Mark Maybee wrote:
>> Pawel,
>>
>> I'm not quite sure I understand why thread #1 below is stalled.  Is
>> there only a single thread available for IO completion?
> 
> There are few, but I belive the thread #2 is trying to complete the very
> I/O request on which thread #1 is waiting.
> Thread #2 can't complete this I/O request, because the lock it needs to
> acquire is held by thread #1.
> 
>>> One thread holds zfsvfs->z_hold_mtx[i] lock and waits for I/O:
>>>
>>> Tracing pid 1188 tid 100114 td 0xc41b1660
> [...]
>>> _cv_wait(de465f94,de465f7c,c43fc4cd,34f,0,...) at _cv_wait+0x1fc
>>> zio_wait(de465d68,c43ea3c5,240,4000,0,...) at zio_wait+0x99
>>> dbuf_read(d3a39594,de465d68,2,c43ee6df,d2e09900,...) at dbuf_read+0x2e7
>>> dnode_hold_impl(c4bbf000,a075,0,1,c43ec318,...) at dnode_hold_impl+0x15a
>>> dnode_hold(c4bbf000,a075,0,c43ec318,f899a758,...) at dnode_hold+0x35
>>> dmu_bonus_hold(c480e540,a075,0,0,f899a7a0,...) at dmu_bonus_hold+0x31
>>> zfs_zget(c3f4f000,a075,0,f899a89c,1,...) at zfs_zget+0x7a
>>> zfs_dirent_lock(f899a8a0,c4d18c64,f899a928,f899a89c,6,...) at 
>>> zfs_dirent_lock+0x619
>>> zfs_dirlook(c4d18c64,f899a928,f899ac20,0,0,...) at zfs_dirlook+0x272
>>> zfs_lookup(cbef4414,f899a928,f899ac20,f899ac34,2,...) at zfs_lookup+0x2df
> [...]
> 
>>> Another thread tries to finish I/O, but can't, because it is trying to
>>> acquire zfsvfs->z_hold_mtx[i] lock:
>>>
>>> Tracing command spa_zio_intr_2 pid 1117 tid 100020 td 0xc3b38440
> [...]
>>> _sx_xlock(c3f4f6a4,0,c43fb80c,31e,c7af015c,...) at _sx_xlock+0xb8
>>> zfs_zinactive(d0d00988,0,c4401326,e27,d0d00988,...) at zfs_zinactive+0xa2
>>> zfs_inactive(c7af015c,c3aa9600,0,c7af015c,f8aa2888,...) at 
>>> zfs_inactive+0x307
> [...]
>>> zfs_get_done(c5832d8c,c71317d0,292,28d,d2387b70,...) at zfs_get_done+0xad
>>> dmu_sync_done(d11e56b4,ccc323c0,c6940930,0,0,...) at dmu_sync_done+0x1f0
>>> arc_write_done(d11e56b4,d4863b2c,e38a2208,4d3,d11d68f0,...) at 
>>> arc_write_done+0x44a
>>> zio_done(d11e56b4,c44033e0,f8aa2a38,c434906a,d13c9e00,...) at zio_done+0xb2
>>> zio_next_stage(d11e56b4,d11e56f8,80,0,f8aa2a68,...) at zio_next_stage+0x236
>>> zio_assess(d11e56b4,c3b384d8,c069938c,c43fc4cd,d11e58c8,...) at 
>>> zio_assess+0x843
>>> zio_next_stage(d11e56b4,c43fc4cd,36d,36a,f8aa2b44,...) at 
>>> zio_next_stage+0x236
>>> zio_wait_for_children(d11e56b4,12,d11e58bc,f8aa2b8c,c436cc76,...) at 
>>> zio_wait_for_children+0x99
>>> zio_wait_children_done(d11e56b4,c0a575a0,c3b384d8,c0697e04,c0679402,...) at 
>>> zio_wait_children_done+0x25
>>> zio_next_stage(d11e56b4,c1074808,0,c0679402,8d6,...) at zio_next_stage+0x236
>>> zio_vdev_io_assess(d11e56b4,c44033e0,40,cef4def8,c3,...) at 
>>> zio_vdev_io_assess+0x2a7
>>> zio_next_stage(d11e56b4,c3b38440,f8aa2c3c,246,c0a116b4,...) at 
>>> zio_next_stage+0x236
>>> vdev_mirror_io_done(d11e56b4,f8aa2cf8,c42c94de,d11e56b4,0,...) at 
>>> vdev_mirror_io_done+0x113
>>> zio_vdev_io_done(d11e56b4,0,c43e6f2d,33d,c442ec14,...) at 
>>> zio_vdev_io_done+0x21
> [...]
> 
> BTW. Why I always see vdev_mirror_io_done() even if I don't use mirror?
> Here I had RAIDZ vdev.
> 
> 
> 
> 
> 
> ___
> zfs-code mailing list
> zfs-code at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-code



[zfs-code] Recent deadlock.

2007-11-07 Thread Pawel Jakub Dawidek
On Wed, Nov 07, 2007 at 07:41:54AM -0700, Mark Maybee wrote:
> Hmm, seems rather unlikely that these two IOs are related.  Thread 1
> is trying to read a dnode in order to extract the znode data from its
> bonus buffer.  Thread 2 is completing a dmu_sync() write (so this is
> the result of a zil operation).  While its possible that the dmu_sync()
> write may involve reading some of the blocks in the dnode from Thread 1,
> this should not result in Thread 1 waiting for anything.

Maybe my interpretation is wrong, but moving this VN_RELE() to separate
kernel thread for later fixes deadlock for me.

> I think vdev_mirror_io_done() often shows up in the stack because the
> ditto-ing code leverages that code path.

Ahh, ok.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
pjd at FreeBSD.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL: 



[zfs-code] Recent deadlock.

2007-11-07 Thread Mark Maybee
Pawel Jakub Dawidek wrote:
> On Wed, Nov 07, 2007 at 07:41:54AM -0700, Mark Maybee wrote:
>> Hmm, seems rather unlikely that these two IOs are related.  Thread 1
>> is trying to read a dnode in order to extract the znode data from its
>> bonus buffer.  Thread 2 is completing a dmu_sync() write (so this is
>> the result of a zil operation).  While its possible that the dmu_sync()
>> write may involve reading some of the blocks in the dnode from Thread 1,
>> this should not result in Thread 1 waiting for anything.
> 
> Maybe my interpretation is wrong, but moving this VN_RELE() to separate
> kernel thread for later fixes deadlock for me.
> 
Right, but I think the problem here is that you are running out of IO
threads.  Do you maintain separate thread sets for each "type" of IO
(i.e. reads, writes, etc)?  We found, early on, that if we didn't
reserve threads for specific types of IOs that we could run into this
type of deadlock.

-Mark



[zfs-code] Recent deadlock.

2007-11-07 Thread Pawel Jakub Dawidek
On Wed, Nov 07, 2007 at 08:57:53AM -0700, Mark Maybee wrote:
> Pawel Jakub Dawidek wrote:
> > On Wed, Nov 07, 2007 at 07:41:54AM -0700, Mark Maybee wrote:
> >> Hmm, seems rather unlikely that these two IOs are related.  Thread 1
> >> is trying to read a dnode in order to extract the znode data from its
> >> bonus buffer.  Thread 2 is completing a dmu_sync() write (so this is
> >> the result of a zil operation).  While its possible that the dmu_sync()
> >> write may involve reading some of the blocks in the dnode from Thread 1,
> >> this should not result in Thread 1 waiting for anything.
> > 
> > Maybe my interpretation is wrong, but moving this VN_RELE() to separate
> > kernel thread for later fixes deadlock for me.
> > 
> Right, but I think the problem here is that you are running out of IO
> threads.  Do you maintain separate thread sets for each "type" of IO
> (i.e. reads, writes, etc)?  We found, early on, that if we didn't
> reserve threads for specific types of IOs that we could run into this
> type of deadlock.

Yes, I've (A * B * C) I/O threads, where A is number of I/O types (6), B
is number of CPUs, and C is 2 (issue and intr threads). This is the same
in Solaris, I belive.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
pjd at FreeBSD.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL: