On Mon, Nov 05, 2007 at 02:22:12PM +0100, Pawel Jakub Dawidek wrote:
> Hi.
> 
> I'm observing the following deadlock.
> 
> One thread holds zfsvfs->z_hold_mtx[i] lock and waits for I/O:

[...]

> Another thread tries to finish I/O, but can't, because it is trying to
> acquire zfsvfs->z_hold_mtx[i] lock:

[...]

> As you can see, the SPA ZIO INTR thread wants to finish I/O requests, calls
> arc_write_done(). All in all we call vn_rele(), which may end up in calling
> zfs_zinactive() if this was the last vnode reference. zfs_zinactive() tries
> to acquire zfsvfs->z_hold_mtx[i] lock, but can't, because it is already held
> by the thread waiting for I/O. Neither I/O can be finished, nor lock can be
> unlocked.
> 
> I started to see this deadlock after last integration, so it was introduced
> in more or less during last month. Any ideas how to solve it?

I was able to trigger this deadlock with older code as well.

It seems I fixed it be changing the zfs_get_done() function from:

void
zfs_get_done(dmu_buf_t *db, void *vzgd)
{
        zgd_t *zgd = (zgd_t *)vzgd;
        rl_t *rl = zgd->zgd_rl;
        vnode_t *vp = ZTOV(rl->r_zp);

        dmu_buf_rele(db, vzgd);
        zfs_range_unlock(rl);
        VN_RELE(vp);
        zil_add_vdev(zgd->zgd_zilog, DVA_GET_VDEV(BP_IDENTITY(zgd->zgd_bp)));
        kmem_free(zgd, sizeof (zgd_t));
}

To something like this:

void                                                                            
                                                                                
             
zfs_get_done(dmu_buf_t *db, void *vzgd)                                         
                                                                                
             
{                                                                               
                                                                                
             
        zgd_t *zgd = (zgd_t *)vzgd;                                             
                                                                                
          
        rl_t *rl = zgd->zgd_rl;                                                 
                                                                                
      
        vnode_t *vp = ZTOV(rl->r_zp);                                           
                                                                                
        
                                                                                
                                                                                
             
        dmu_buf_rele(db, vzgd);                                                 
                                                                                
      
        zfs_range_unlock(rl);                                                   
                                                                                
        
        zgd->zgd_vnode = vp;                                                    
                                                                                
         
        zil_add_vdev(zgd->zgd_zilog, DVA_GET_VDEV(BP_IDENTITY(zgd->zgd_bp)));   
                                                                                
        
        /*
         * Put the vnode onto the list. Call VN_RELE() on this vnode from
         * within another kernel thread later (also free zgd there).
         */
        mtx_lock(&zgd_done_mtx);                                                
                                                                                
             
        TAILQ_INSERT_TAIL(&zgd_done, zgd, zgd_next);                            
                                                                                
         
        mtx_unlock(&zgd_done_mtx);                                              
                                                                                
           
        wakeup(&zgd_done);                                                      
                                                                                
           
}        

Which bascially means, that VN_RELE() will be called from another thread
later. This seems to work, but I changed the order of VN_RELE() and
zil_add_vdev() calls. Is that safe? Doing zil_add_vdev() also from this
additional thread didn't worked for me.

-- 
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd at FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL: 
<http://mail.opensolaris.org/pipermail/zfs-code/attachments/20071107/e27a4697/attachment.bin>

Reply via email to