On Sun, 5 May 2019 at 06:26, Martin Pieuchot <m...@openbsd.org> wrote: > > On 27/04/19(Sat) 21:55, Nathanael Rensen wrote: > > The diff below speeds up ld.so library intialisation where the dependency > > tree is broad and deep, such as samba's smbd which links over 100 libraries. > > > > See for example https://marc.info/?l=openbsd-misc&m=155007285712913&w=2 > > > > See https://marc.info/?l=openbsd-tech&m=155637285221396&w=2 for part 1 > > that speeds up library loading. > > > > The timings below are for /usr/local/sbin/smbd --version: > > > > Timing without either diff : 6m45.67s real 6m45.65s user 0m00.02s system > > Timing with part 1 diff only: 4m42.88s real 4m42.85s user 0m00.02s system > > Timing with part 2 diff only: 2m02.61s real 2m02.60s user 0m00.01s system > > Timing with both diffs : 0m00.03s real 0m00.03s user 0m00.00s system > > > > Note that these timings are for a build of a recent samba master tree > > (linked with kerberos) which is probably slower than the OpenBSD port. > > Nice numbers. Could you explain in words what your diff is doing? Why > does splitting the flag help? Is it because some ctors/initarray are > being initialized multiple times currently?
No, the STAT_INIT_DONE flag prevents that. > Or is it just to prevent some traversal? Yes. > In that case does that mean the `STAT_VISISTED' flag is removed too > early? Yes, STAT_VISITED is removed too early. The visited flag is set on a node while traversing the child nodes of that node and then removed. It serves to protect against circular dependencies, but does not prevent repeatedly traversing through a node that appears on separate branches. The entire tree must be traversed twice - first to initialise the DF_1_INITFIRST libraries, and secondly to initialise the others. This is presumably why this diff contributes roughly twice as much speedup as the part 1 diff. To be effective in avoiding repeated traversals the visited flag must persist throughout an entire tree traversal but it must either be cleared between first and second traversals or a different flag used for the second traversal. My approach was to add a second visited flag and make them both persistent. My rationale for why I believe the flags may be persisted is as follows. dlopen() calls _dl_call_init() with the newly loaded object and neither the newly loaded object nor any newly loaded children of that object will have either visited flag set. Already loaded children will have those flags set, but they won't have gained any new children as a result of the dlopen(). If this reasoning is wrong then the diff is wrong and could lead to uninitialised libraries (and an ld.so regress test should probably be created to catch that situation). It occurs to me as I'm writing this that perhaps it's possible to avoid a tree traversal entirely by walking the linearised grpsym_list in reverse and relying only on the STAT_INIT_DONE flag. /* * grpsym_list is an ordered list of all child libs of the * _dl_loading_object with no dups. The order is equivalent * to a breadth-first traversal of the child list without dups. */ I don't think it is a true breadth-first traversal, not in the way I understand breadth-first, but it does ensure that parent nodes appear before child nodes. So in reverse, child nodes will appear before parent nodes. While this is not the same as a depth-first traversal it may be OK. There may be some specific requirements of DF_1_INITFIRST that need to be taken into account. Nathanael > > > Index: libexec/ld.so/loader.c > > =================================================================== > > RCS file: /cvs/src/libexec/ld.so/loader.c,v > > retrieving revision 1.177 > > diff -u -p -p -u -r1.177 loader.c > > --- libexec/ld.so/loader.c 3 Dec 2018 05:29:56 -0000 1.177 > > +++ libexec/ld.so/loader.c 27 Apr 2019 13:24:02 -0000 > > @@ -749,15 +749,15 @@ _dl_call_init_recurse(elf_object_t *obje > > { > > struct dep_node *n; > > > > - object->status |= STAT_VISITED; > > + int visited_flag = initfirst ? STAT_VISITED_1 : STAT_VISITED_2; > > + > > + object->status |= visited_flag; > > > > TAILQ_FOREACH(n, &object->child_list, next_sib) { > > - if (n->data->status & STAT_VISITED) > > + if (n->data->status & visited_flag) > > continue; > > _dl_call_init_recurse(n->data, initfirst); > > } > > - > > - object->status &= ~STAT_VISITED; > > > > if (object->status & STAT_INIT_DONE) > > return; > > Index: libexec/ld.so/resolve.h > > =================================================================== > > RCS file: /cvs/src/libexec/ld.so/resolve.h,v > > retrieving revision 1.90 > > diff -u -p -p -u -r1.90 resolve.h > > --- libexec/ld.so/resolve.h 21 Apr 2019 04:11:42 -0000 1.90 > > +++ libexec/ld.so/resolve.h 27 Apr 2019 13:24:02 -0000 > > @@ -125,8 +125,9 @@ struct elf_object { > > #define STAT_FINI_READY 0x10 > > #define STAT_UNLOADED 0x20 > > #define STAT_NODELETE 0x40 > > -#define STAT_VISITED 0x80 > > +#define STAT_VISITED_1 0x80 > > #define STAT_GNU_HASH 0x100 > > +#define STAT_VISITED_2 0x200 > > > > Elf_Phdr *phdrp; > > int phdrc; > >