On 23/11/22(Wed) 16:34, Mark Kettenis wrote:
> > Date: Wed, 23 Nov 2022 10:52:32 +0100
> > From: Martin Pieuchot <m...@openbsd.org>
> > 
> > On 22/11/22(Tue) 23:40, Mark Kettenis wrote:
> > > > Date: Tue, 22 Nov 2022 17:47:44 +0000
> > > > From: Miod Vallat <m...@online.fr>
> > > > 
> > > > > Here is a diff.  Maybe bluhm@ can try this on the macppc machine that
> > > > > triggered the original "vref used where vget required" problem?
> > > > 
> > > > On a similar machine it panics after a few hours with:
> > > > 
> > > > panic: uvn_flush: PGO_SYNCIO return 'try again' error (impossible)
> > > > 
> > > > The trace (transcribed by hand) is
> > > > uvn_flush+0x820
> > > > uvm_vnp_terminate+0x79
> > > > vclean+0xdc
> > > > vgonel+0x70
> > > > getnewvnode+0x240
> > > > ffs_vget+0xcc
> > > > ffs_inode_alloc+0x13c
> > > > ufs_makeinode+0x94
> > > > ufs_create+0x58
> > > > VOP_CREATE+0x48
> > > > vn_open+0x188
> > > > doopenat+0x1b4
> > > 
> > > Ah right, there is another path where we end up with a refcount of
> > > zero.  Should be fixable, but I need to think about this for a bit.
> > 
> > Not sure to understand what you mean with refcount of 0.  Could you
> > elaborate?
> 
> Sorry, I was thinking ahead a bit.  I'm pretty much convinced that the
> issue we're dealing with is a race between a vnode being
> recycled/cleaned and the pagedaemon paging out pages associated with
> that same vnode.
> 
> The crashes we've seen before were all in the pagedaemon path where we
> end up calling into the VFS layer with a vnode that has v_usecount ==
> 0.  My "fix" avoids that, but hits the issue that when we are in the
> codepath that is recycling/cleaning the vnode, we can't use vget() to
> get a reference to the vnode since it checks that the vnode isn't in
> the process of being cleaned.
> 
> But if we avoid that issue (by for example) skipping the vget() call
> if the UVM_VNODE_DYING flag is set, we run into the same scenario
> where we call into the VFS layer with v_usecount == 0.  Now that may
> not actually be a problem, but I need to investigate this a bit more.

When the UVM_VNODE_DYING flag is set the caller always own a valid
reference to the vnode.  Either because it is in the process of cleaning
it via  uvm_vnp_terminate() or because it uvn_detach() has been called
which means the reference to the vnode hasn't been dropped yet.  So I
believe `v_usecount' for such vnode is positive.

> Or maybe calling into the VFS layer with a vnode that has v_usecount
> == 0 is perfectly fine and we should do the vget() dance I propose in
> uvm_vnp_unache() instead of in uvn_put().

I'm not following.  uvm_vnp_uncache() is always called with a valid
vnode, no?

Reply via email to