On 03/11/15 at 01:46P, Philip Guenther wrote:
> On Sun, 25 Oct 2015, Peter Hajdu wrote:
> > I try to give it one more attempt with a bit more description about the 
> > bug.
> > 
> > After calling dlclose in _dl_notify_unload_shlib_ group reference counts 
> > are decreased by following the object's grpref-list.  Unfortunately the 
> > references are removed from the list during the graph traversal.
> > 
> > dlclose will run all destructors of the unused objects and tries to 
> > unload all objects reachable from the closed object's child and 
> > grpref-list.  Since the grpref-list references were removed, the unused 
> > destructed object stays in memory as garbage.  Next time when this 
> > object is loaded it is found in the memory and crashes during load.
> > 
> > This patch unloads all unused objects instead of following the closed 
> > object's child and grpref list.
> 
> Thank you for working on this.  After a long rumination, I'd like to 
> propose a different diff, seen below.
> 
> Your diff changes dlclose() to switch from calling _dl_unload_shlib() to a 
> new function _dl_unload_unused().  They both unload refcnt==0 objects: 
> _dl_unload_shlib() finds them by walking the dependency tree from the 
> selected object, while _dl_unload_unused() just scans the entire list.
> 
> So why is the former not sufficient?  As you describe, the problem occurs 
> when a grpref is removed which is the last reference to an object. grprefs 
> are used guarantee that entire load groups are unloaded all at once and 
> not piecemeal.  If later dlopen() adds a dependency to a child in this 
> load group, the entire group will be kept even if that child is the last 
> real link.  When that dependency is added, the later object takes a grpref 
> to the load_object, which is root of the load group.
> 
> As you note _dl_run_all_dtors() releases that grpref, but we still know 
> where it had pointed: to the load_object of some object being released!  
> So we can retain the behavior of _dl_unload_shlib(), but we need to add a 
> check for whether our load_object is now unreferenced.  If so, it 
> previously had a grpref which has been released, so we need to take down 
> the entire load group.
> 
> Thus the diff below.  It works with your test setup (thanks for writing 
> that!), passes regress/libexec/ld.so/, and chrome hasn't choked on it.  
> Can someone who's familiar with the sdl problem case test it there?
> 

Hi Philip,

I've tested the patch on amd64 with a simple sdl2 test and with my
original tests on both amd64 and i386.  Everything seems to work just
fine.  Thank you very much for your effort.

Peter

> 
> Does that make sense?
> 
> 
> Again, thank you for pushing on this.  If what I show here works, it's 
> because your description triggered an "ahhh, wait a moment..." thought.
> 
> 
> Philip
> 
> 
> Index: library.c
> ===================================================================
> RCS file: /data/src/openbsd/src/libexec/ld.so/library.c,v
> retrieving revision 1.71
> diff -u -p -r1.71 library.c
> --- library.c 16 Jan 2015 16:18:07 -0000      1.71
> +++ library.c 3 Nov 2015 09:09:15 -0000
> @@ -59,9 +59,27 @@ void
>  _dl_unload_shlib(elf_object_t *object)
>  {
>       struct dep_node *n;
> +     elf_object_t *load_object = object->load_object;
> +
> +     /*
> +      * If our load object has become unreferenced then we lost the
> +      * last group reference to it, so the entire group should be taken
> +      * down.  The current object is somewhere below load_object in
> +      * the child_list tree, so it'll get cleaned up by the recursion.
> +      * That means we can just switch here to the load object.
> +      */
> +     if (load_object != object && OBJECT_REF_CNT(load_object) == 0 &&
> +         (load_object->status & STAT_UNLOADED) == 0) {
> +             DL_DEB(("unload_shlib switched from %s to %s\n",
> +                 object->load_name, load_object->load_name));
> +             object = load_object;
> +             goto unload;
> +     }
> +
>       DL_DEB(("unload_shlib called on %s\n", object->load_name));
>       if (OBJECT_REF_CNT(object) == 0 &&
>           (object->status & STAT_UNLOADED) == 0) {
> +unload:
>               object->status |= STAT_UNLOADED;
>               TAILQ_FOREACH(n, &object->child_list, next_sib)
>                       _dl_unload_shlib(n->data);
> Index: library_mquery.c
> ===================================================================
> RCS file: /data/src/openbsd/src/libexec/ld.so/library_mquery.c,v
> retrieving revision 1.49
> diff -u -p -r1.49 library_mquery.c
> --- library_mquery.c  22 Jan 2015 05:48:17 -0000      1.49
> +++ library_mquery.c  3 Nov 2015 09:10:39 -0000
> @@ -64,10 +64,27 @@ void
>  _dl_unload_shlib(elf_object_t *object)
>  {
>       struct dep_node *n;
> +     elf_object_t *load_object = object->load_object;
> +
> +     /*
> +      * If our load object has become unreferenced then we lost the
> +      * last group reference to it, so the entire group should be taken
> +      * down.  The current object is somewhere below load_object in
> +      * the child_list tree, so it'll get cleaned up by the recursion.
> +      * That means we can just switch here to the load object.
> +      */
> +     if (load_object != object && OBJECT_REF_CNT(load_object) == 0 &&
> +         (load_object->status & STAT_UNLOADED) == 0) {
> +             DL_DEB(("unload_shlib switched from %s to %s\n",
> +                 object->load_name, load_object->load_name));
> +             object = load_object;
> +             goto unload;
> +     }
>  
>       DL_DEB(("unload_shlib called on %s\n", object->load_name));
>       if (OBJECT_REF_CNT(object) == 0 &&
>           (object->status & STAT_UNLOADED) == 0) {
> +unload:
>               object->status |= STAT_UNLOADED;
>               TAILQ_FOREACH(n, &object->child_list, next_sib)
>                       _dl_unload_shlib(n->data);

-- 
Peter Hajdu

Reply via email to