Re: Get rid of UVM_VNODE_CANPERSIST

Mark Kettenis Thu, 17 Nov 2022 11:24:16 -0800

> From: Jeremie Courreges-Anglas <j...@wxcvbn.org>
> Date: Thu, 17 Nov 2022 18:00:21 +0100
> 
> On Tue, Nov 15 2022, Martin Pieuchot <m...@openbsd.org> wrote:
> > UVM vnode objects include a reference count to keep track of the number
> > of processes that have the corresponding pages mapped in their VM space.
> >
> > When the last process referencing a given library or executable dies,
> > the reaper will munmap this object on its behalf.  When this happens it
> > doesn't free the associated pages to speed-up possible re-use of the
> > file.  Instead the pages are placed on the inactive list but stay ready
> > to be pmap_enter()'d without requiring I/O as soon as a newly process
> > needs to access them.
> >
> > The mechanism to keep pages populated, known as UVM_VNODE_CANPERSIST,
> > doesn't work well with swapping [0].  For some reason when the page daemon
> > wants to free pages on the inactive list it tries to flush the pages to
> > disk and panic(9) because it needs a valid reference to the vnode to do
> > so.
> >
> > This indicates that the mechanism described above, which seems to work
> > fine for RO mappings, is currently buggy in more complex situations.
> > Flushing the pages when the last reference of the UVM object is dropped
> > also doesn't seem to be enough as bluhm@ reported [1].
> >
> > The diff below, which has already be committed and reverted, gets rid of
> > the UVM_VNODE_CANPERSIST logic.  I'd like to commit it again now that
> > the arm64 caching bug has been found and fixed.
> >
> > Getting rid of this logic means more I/O will be generated and pages
> > might have a faster reuse cycle.  I'm aware this might introduce a small
> > slowdown,
> 
> Numbers for my usual make -j4 in libc,
> on an Unmatched riscv64 box, now:
>    16m32.65s real    21m36.79s user    30m53.45s system
>    16m32.37s real    21m33.40s user    31m17.98s system
>    16m32.63s real    21m35.74s user    31m12.01s system
>    16m32.13s real    21m36.12s user    31m06.92s system
> After:
>    19m14.15s real    21m09.39s user    36m51.33s system
>    19m19.11s real    21m02.61s user    36m58.46s system
>    19m21.77s real    21m09.23s user    37m03.85s system
>    19m09.39s real    21m08.96s user    36m36.00s system
> 
> 4 cores amd64 VM, before (-current plus an other diff):
>    1m54.31s real     2m47.36s user     4m24.70s system
>    1m52.64s real     2m45.68s user     4m23.46s system
>    1m53.47s real     2m43.59s user     4m27.60s system
> After:
>    2m34.12s real     2m51.15s user     6m20.91s system
>    2m34.30s real     2m48.48s user     6m23.34s system
>    2m37.07s real     2m49.60s user     6m31.53s system
> 
> > however I believe we should work towards loading files from the
> > buffer cache to save I/O cycles instead of having another layer of cache.
> > Such work isn't trivial and making sure the vnode <-> UVM relation is
> > simple and well understood is the first step in this direction.
> >
> > I'd appreciate if the diff below could be tested on many architectures,
> > include the offending rpi4.
> 
> Mike has already tested a make build on a riscv64 Unmatched.  I have
> also run regress in sys, lib/libc and lib/libpthread on that arch.  As
> far as I can see this looks stable on my machine, but what I really care
> about is the riscv64 bulk build cluster (I'm going to start another
> bulk build soon).
> 
> > Comments?  Oks?
> 
> The performance drop in my microbenchmark kinda worries me but it's only
> a microbenchmark...


I wouldn't call this a microbenchmark.  I fear this is typical for
builds of anything on clang architectures.  And I expect it to be
worse on single-processor machine where *every* time we execute clang
or lld all the pages are thrown away upon exit and will need to be
read back from disk again for the next bit of C code we compile.

Having mmap'ed reads go through the buffer cache should indeed help
here, but that smells like UBC and even if it isn't, it is something
we don't have and will be tricky to get right.

The discussions we had during h2k22 made me understand this thing a
bit better.  Is there any reason why we can't keep a reference to the
vnode and have the pagedaemon drop it when it drops the pages?  Is
there a chance that we run out of vnodes this way?  I vaguely remember
beck@ saying that we can have tons of vnodes now.

> > [0] https://marc.info/?l=openbsd-bugs&m=164846737707559&w=2 
> > [1] https://marc.info/?l=openbsd-bugs&m=166843373415030&w=2
> >
> > Index: uvm/uvm_vnode.c
> > ===================================================================
> > RCS file: /cvs/src/sys/uvm/uvm_vnode.c,v
> > retrieving revision 1.130
> > diff -u -p -r1.130 uvm_vnode.c
> > --- uvm/uvm_vnode.c 20 Oct 2022 13:31:52 -0000      1.130
> > +++ uvm/uvm_vnode.c 15 Nov 2022 13:28:28 -0000
> > @@ -161,11 +161,8 @@ uvn_attach(struct vnode *vp, vm_prot_t a
> >      * add it to the writeable list, and then return.
> >      */
> >     if (uvn->u_flags & UVM_VNODE_VALID) {   /* already active? */
> > +           KASSERT(uvn->u_obj.uo_refs > 0);
> >  
> > -           /* regain vref if we were persisting */
> > -           if (uvn->u_obj.uo_refs == 0) {
> > -                   vref(vp);
> > -           }
> >             uvn->u_obj.uo_refs++;           /* bump uvn ref! */
> >  
> >             /* check for new writeable uvn */
> > @@ -235,14 +232,14 @@ uvn_attach(struct vnode *vp, vm_prot_t a
> >     KASSERT(uvn->u_obj.uo_refs == 0);
> >     uvn->u_obj.uo_refs++;
> >     oldflags = uvn->u_flags;
> > -   uvn->u_flags = UVM_VNODE_VALID|UVM_VNODE_CANPERSIST;
> > +   uvn->u_flags = UVM_VNODE_VALID;
> >     uvn->u_nio = 0;
> >     uvn->u_size = used_vnode_size;
> >  
> >     /*
> >      * add a reference to the vnode.   this reference will stay as long
> >      * as there is a valid mapping of the vnode.   dropped when the
> > -    * reference count goes to zero [and we either free or persist].
> > +    * reference count goes to zero.
> >      */
> >     vref(vp);
> >  
> > @@ -323,16 +320,6 @@ uvn_detach(struct uvm_object *uobj)
> >      */
> >     vp->v_flag &= ~VTEXT;
> >  
> > -   /*
> > -    * we just dropped the last reference to the uvn.   see if we can
> > -    * let it "stick around".
> > -    */
> > -   if (uvn->u_flags & UVM_VNODE_CANPERSIST) {
> > -           /* won't block */
> > -           uvn_flush(uobj, 0, 0, PGO_DEACTIVATE|PGO_ALLPAGES);
> > -           goto out;
> > -   }
> > -
> >     /* its a goner! */
> >     uvn->u_flags |= UVM_VNODE_DYING;
> >  
> > @@ -382,7 +369,6 @@ uvn_detach(struct uvm_object *uobj)
> >     /* wake up any sleepers */
> >     if (oldflags & UVM_VNODE_WANTED)
> >             wakeup(uvn);
> > -out:
> >     rw_exit(uobj->vmobjlock);
> >  
> >     /* drop our reference to the vnode. */
> > @@ -498,8 +484,8 @@ uvm_vnp_terminate(struct vnode *vp)
> >     }
> >  
> >     /*
> > -    * done.   now we free the uvn if its reference count is zero
> > -    * (true if we are zapping a persisting uvn).   however, if we are
> > +    * done.   now we free the uvn if its reference count is zero.
> > +    * however, if we are
> >      * terminating a uvn with active mappings we let it live ... future
> >      * calls down to the vnode layer will fail.
> >      */
> > @@ -507,14 +493,14 @@ uvm_vnp_terminate(struct vnode *vp)
> >     if (uvn->u_obj.uo_refs) {
> >             /*
> >              * uvn must live on it is dead-vnode state until all references
> > -            * are gone.   restore flags.    clear CANPERSIST state.
> > +            * are gone.   restore flags.
> >              */
> >             uvn->u_flags &= ~(UVM_VNODE_DYING|UVM_VNODE_VNISLOCKED|
> > -                 UVM_VNODE_WANTED|UVM_VNODE_CANPERSIST);
> > +                 UVM_VNODE_WANTED);
> >     } else {
> >             /*
> >              * free the uvn now.   note that the vref reference is already
> > -            * gone [it is dropped when we enter the persist state].
> > +            * gone.
> >              */
> >             if (uvn->u_flags & UVM_VNODE_IOSYNCWANTED)
> >                     panic("uvm_vnp_terminate: io sync wanted bit set");
> > @@ -1343,6 +1329,7 @@ uvm_vnp_uncache(struct vnode *vp)
> >  {
> >     struct uvm_vnode *uvn = vp->v_uvm;
> >     struct uvm_object *uobj = &uvn->u_obj;
> > +   int refs;
> >  
> >     /* lock uvn part of the vnode and check if we need to do anything */
> >  
> > @@ -1354,47 +1341,12 @@ uvm_vnp_uncache(struct vnode *vp)
> >     }
> >  
> >     /*
> > -    * we have a valid, non-blocked uvn.   clear persist flag.
> > +    * we have a valid, non-blocked uvn.
> >      * if uvn is currently active we can return now.
> >      */
> > -   uvn->u_flags &= ~UVM_VNODE_CANPERSIST;
> > -   if (uvn->u_obj.uo_refs) {
> > -           rw_exit(uobj->vmobjlock);
> > -           return FALSE;
> > -   }
> > -
> > -   /*
> > -    * uvn is currently persisting!   we have to gain a reference to
> > -    * it so that we can call uvn_detach to kill the uvn.
> > -    */
> > -   vref(vp);                       /* seems ok, even with VOP_LOCK */
> > -   uvn->u_obj.uo_refs++;           /* value is now 1 */
> > +   refs = uvn->u_obj.uo_refs;
> >     rw_exit(uobj->vmobjlock);
> > -
> > -#ifdef VFSLCKDEBUG
> > -   /*
> > -    * carry over sanity check from old vnode pager: the vnode should
> > -    * be VOP_LOCK'd, and we confirm it here.
> > -    */
> > -   if ((vp->v_flag & VLOCKSWORK) && !VOP_ISLOCKED(vp))
> > -           panic("uvm_vnp_uncache: vnode not locked!");
> > -#endif
> > -
> > -   /*
> > -    * now drop our reference to the vnode.   if we have the sole
> > -    * reference to the vnode then this will cause it to die [as we
> > -    * just cleared the persist flag].   we have to unlock the vnode
> > -    * while we are doing this as it may trigger I/O.
> > -    *
> > -    * XXX: it might be possible for uvn to get reclaimed while we are
> > -    * unlocked causing us to return TRUE when we should not.   we ignore
> > -    * this as a false-positive return value doesn't hurt us.
> > -    */
> > -   VOP_UNLOCK(vp);
> > -   uvn_detach(&uvn->u_obj);
> > -   vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
> > -
> > -   return TRUE;
> > +   return refs > 0 ? FALSE : TRUE;
> >  }
> >  
> >  /*
> > @@ -1488,11 +1440,9 @@ uvm_vnp_sync(struct mount *mp)
> >                     continue;
> >  
> >             /*
> > -            * gain reference.   watch out for persisting uvns (need to
> > -            * regain vnode REF).
> > +            * gain reference.
> >              */
> > -           if (uvn->u_obj.uo_refs == 0)
> > -                   vref(vp);
> > +           KASSERT(uvn->u_obj.uo_refs > 0);
> >             uvn->u_obj.uo_refs++;
> >  
> >             SIMPLEQ_INSERT_HEAD(&uvn_sync_q, uvn, u_syncq);
> > Index: uvm/uvm_vnode.h
> > ===================================================================
> > RCS file: /cvs/src/sys/uvm/uvm_vnode.h,v
> > retrieving revision 1.21
> > diff -u -p -r1.21 uvm_vnode.h
> > --- uvm/uvm_vnode.h 20 Oct 2022 13:31:52 -0000      1.21
> > +++ uvm/uvm_vnode.h 15 Nov 2022 12:58:38 -0000
> > @@ -71,7 +71,6 @@ struct uvm_vnode {
> >   * u_flags values
> >   */
> >  #define UVM_VNODE_VALID            0x001   /* we are attached to the vnode 
> > */
> > -#define UVM_VNODE_CANPERSIST       0x002   /* we can persist after ref == 
> > 0 */
> >  #define UVM_VNODE_ALOCK            0x004   /* uvn_attach is locked out */
> >  #define UVM_VNODE_DYING            0x008   /* final detach/terminate in 
> >                                        progress */
> >
> 
> -- 
> jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE
> 
>

Re: Get rid of UVM_VNODE_CANPERSIST

Reply via email to