On 23/11/22(Wed) 16:34, Mark Kettenis wrote: > > Date: Wed, 23 Nov 2022 10:52:32 +0100 > > From: Martin Pieuchot <m...@openbsd.org> > > > > On 22/11/22(Tue) 23:40, Mark Kettenis wrote: > > > > Date: Tue, 22 Nov 2022 17:47:44 +0000 > > > > From: Miod Vallat <m...@online.fr> > > > > > > > > > Here is a diff. Maybe bluhm@ can try this on the macppc machine that > > > > > triggered the original "vref used where vget required" problem? > > > > > > > > On a similar machine it panics after a few hours with: > > > > > > > > panic: uvn_flush: PGO_SYNCIO return 'try again' error (impossible) > > > > > > > > The trace (transcribed by hand) is > > > > uvn_flush+0x820 > > > > uvm_vnp_terminate+0x79 > > > > vclean+0xdc > > > > vgonel+0x70 > > > > getnewvnode+0x240 > > > > ffs_vget+0xcc > > > > ffs_inode_alloc+0x13c > > > > ufs_makeinode+0x94 > > > > ufs_create+0x58 > > > > VOP_CREATE+0x48 > > > > vn_open+0x188 > > > > doopenat+0x1b4 > > > > > > Ah right, there is another path where we end up with a refcount of > > > zero. Should be fixable, but I need to think about this for a bit. > > > > Not sure to understand what you mean with refcount of 0. Could you > > elaborate? > > Sorry, I was thinking ahead a bit. I'm pretty much convinced that the > issue we're dealing with is a race between a vnode being > recycled/cleaned and the pagedaemon paging out pages associated with > that same vnode. > > The crashes we've seen before were all in the pagedaemon path where we > end up calling into the VFS layer with a vnode that has v_usecount == > 0. My "fix" avoids that, but hits the issue that when we are in the > codepath that is recycling/cleaning the vnode, we can't use vget() to > get a reference to the vnode since it checks that the vnode isn't in > the process of being cleaned. > > But if we avoid that issue (by for example) skipping the vget() call > if the UVM_VNODE_DYING flag is set, we run into the same scenario > where we call into the VFS layer with v_usecount == 0. Now that may > not actually be a problem, but I need to investigate this a bit more.
When the UVM_VNODE_DYING flag is set the caller always own a valid reference to the vnode. Either because it is in the process of cleaning it via uvm_vnp_terminate() or because it uvn_detach() has been called which means the reference to the vnode hasn't been dropped yet. So I believe `v_usecount' for such vnode is positive. > Or maybe calling into the VFS layer with a vnode that has v_usecount > == 0 is perfectly fine and we should do the vget() dance I propose in > uvm_vnp_unache() instead of in uvn_put(). I'm not following. uvm_vnp_uncache() is always called with a valid vnode, no?