On 03/11/15 at 01:46P, Philip Guenther wrote: > On Sun, 25 Oct 2015, Peter Hajdu wrote: > > I try to give it one more attempt with a bit more description about the > > bug. > > > > After calling dlclose in _dl_notify_unload_shlib_ group reference counts > > are decreased by following the object's grpref-list. Unfortunately the > > references are removed from the list during the graph traversal. > > > > dlclose will run all destructors of the unused objects and tries to > > unload all objects reachable from the closed object's child and > > grpref-list. Since the grpref-list references were removed, the unused > > destructed object stays in memory as garbage. Next time when this > > object is loaded it is found in the memory and crashes during load. > > > > This patch unloads all unused objects instead of following the closed > > object's child and grpref list. > > Thank you for working on this. After a long rumination, I'd like to > propose a different diff, seen below. > > Your diff changes dlclose() to switch from calling _dl_unload_shlib() to a > new function _dl_unload_unused(). They both unload refcnt==0 objects: > _dl_unload_shlib() finds them by walking the dependency tree from the > selected object, while _dl_unload_unused() just scans the entire list. > > So why is the former not sufficient? As you describe, the problem occurs > when a grpref is removed which is the last reference to an object. grprefs > are used guarantee that entire load groups are unloaded all at once and > not piecemeal. If later dlopen() adds a dependency to a child in this > load group, the entire group will be kept even if that child is the last > real link. When that dependency is added, the later object takes a grpref > to the load_object, which is root of the load group. > > As you note _dl_run_all_dtors() releases that grpref, but we still know > where it had pointed: to the load_object of some object being released! > So we can retain the behavior of _dl_unload_shlib(), but we need to add a > check for whether our load_object is now unreferenced. If so, it > previously had a grpref which has been released, so we need to take down > the entire load group. > > Thus the diff below. It works with your test setup (thanks for writing > that!), passes regress/libexec/ld.so/, and chrome hasn't choked on it. > Can someone who's familiar with the sdl problem case test it there? >
Hi Philip, I've tested the patch on amd64 with a simple sdl2 test and with my original tests on both amd64 and i386. Everything seems to work just fine. Thank you very much for your effort. Peter > > Does that make sense? > > > Again, thank you for pushing on this. If what I show here works, it's > because your description triggered an "ahhh, wait a moment..." thought. > > > Philip > > > Index: library.c > =================================================================== > RCS file: /data/src/openbsd/src/libexec/ld.so/library.c,v > retrieving revision 1.71 > diff -u -p -r1.71 library.c > --- library.c 16 Jan 2015 16:18:07 -0000 1.71 > +++ library.c 3 Nov 2015 09:09:15 -0000 > @@ -59,9 +59,27 @@ void > _dl_unload_shlib(elf_object_t *object) > { > struct dep_node *n; > + elf_object_t *load_object = object->load_object; > + > + /* > + * If our load object has become unreferenced then we lost the > + * last group reference to it, so the entire group should be taken > + * down. The current object is somewhere below load_object in > + * the child_list tree, so it'll get cleaned up by the recursion. > + * That means we can just switch here to the load object. > + */ > + if (load_object != object && OBJECT_REF_CNT(load_object) == 0 && > + (load_object->status & STAT_UNLOADED) == 0) { > + DL_DEB(("unload_shlib switched from %s to %s\n", > + object->load_name, load_object->load_name)); > + object = load_object; > + goto unload; > + } > + > DL_DEB(("unload_shlib called on %s\n", object->load_name)); > if (OBJECT_REF_CNT(object) == 0 && > (object->status & STAT_UNLOADED) == 0) { > +unload: > object->status |= STAT_UNLOADED; > TAILQ_FOREACH(n, &object->child_list, next_sib) > _dl_unload_shlib(n->data); > Index: library_mquery.c > =================================================================== > RCS file: /data/src/openbsd/src/libexec/ld.so/library_mquery.c,v > retrieving revision 1.49 > diff -u -p -r1.49 library_mquery.c > --- library_mquery.c 22 Jan 2015 05:48:17 -0000 1.49 > +++ library_mquery.c 3 Nov 2015 09:10:39 -0000 > @@ -64,10 +64,27 @@ void > _dl_unload_shlib(elf_object_t *object) > { > struct dep_node *n; > + elf_object_t *load_object = object->load_object; > + > + /* > + * If our load object has become unreferenced then we lost the > + * last group reference to it, so the entire group should be taken > + * down. The current object is somewhere below load_object in > + * the child_list tree, so it'll get cleaned up by the recursion. > + * That means we can just switch here to the load object. > + */ > + if (load_object != object && OBJECT_REF_CNT(load_object) == 0 && > + (load_object->status & STAT_UNLOADED) == 0) { > + DL_DEB(("unload_shlib switched from %s to %s\n", > + object->load_name, load_object->load_name)); > + object = load_object; > + goto unload; > + } > > DL_DEB(("unload_shlib called on %s\n", object->load_name)); > if (OBJECT_REF_CNT(object) == 0 && > (object->status & STAT_UNLOADED) == 0) { > +unload: > object->status |= STAT_UNLOADED; > TAILQ_FOREACH(n, &object->child_list, next_sib) > _dl_unload_shlib(n->data); -- Peter Hajdu