On Sun, 5 May 2019 at 06:26, Martin Pieuchot <m...@openbsd.org> wrote:
>
> On 27/04/19(Sat) 21:55, Nathanael Rensen wrote:
> > The diff below speeds up ld.so library intialisation where the dependency
> > tree is broad and deep, such as samba's smbd which links over 100 libraries.
> >
> > See for example https://marc.info/?l=openbsd-misc&m=155007285712913&w=2
> >
> > See https://marc.info/?l=openbsd-tech&m=155637285221396&w=2 for part 1
> > that speeds up library loading.
> >
> > The timings below are for /usr/local/sbin/smbd --version:
> >
> > Timing without either diff  : 6m45.67s real  6m45.65s user  0m00.02s system
> > Timing with part 1 diff only: 4m42.88s real  4m42.85s user  0m00.02s system
> > Timing with part 2 diff only: 2m02.61s real  2m02.60s user  0m00.01s system
> > Timing with both diffs      : 0m00.03s real  0m00.03s user  0m00.00s system
> >
> > Note that these timings are for a build of a recent samba master tree
> > (linked with kerberos) which is probably slower than the OpenBSD port.
>
> Nice numbers.  Could you explain in words what your diff is doing?  Why
> does splitting the flag help?  Is it because some ctors/initarray are
> being initialized multiple times currently?

No, the STAT_INIT_DONE flag prevents that.

> Or is it just to prevent some traversal?

Yes.

> In that case does that mean the `STAT_VISISTED' flag is removed too
> early?

Yes, STAT_VISITED is removed too early. The visited flag is set on a node
while traversing the child nodes of that node and then removed. It serves
to protect against circular dependencies, but does not prevent repeatedly
traversing through a node that appears on separate branches.

The entire tree must be traversed twice - first to initialise the
DF_1_INITFIRST libraries, and secondly to initialise the others. This
is presumably why this diff contributes roughly twice as much speedup
as the part 1 diff. To be effective in avoiding repeated traversals
the visited flag must persist throughout an entire tree traversal but
it must either be cleared between first and second traversals or a
different flag used for the second traversal.

My approach was to add a second visited flag and make them both
persistent. My rationale for why I believe the flags may be
persisted is as follows. dlopen() calls _dl_call_init() with the
newly loaded object and neither the newly loaded object nor any
newly loaded children of that object will have either visited flag
set. Already loaded children will have those flags set, but they
won't have gained any new children as a result of the dlopen().
If this reasoning is wrong then the diff is wrong and could lead to
uninitialised libraries (and an ld.so regress test should probably
be created to catch that situation).

It occurs to me as I'm writing this that perhaps it's possible to
avoid a tree traversal entirely by walking the linearised grpsym_list
in reverse and relying only on the STAT_INIT_DONE flag.
        /*
         * grpsym_list is an ordered list of all child libs of the
         * _dl_loading_object with no dups. The order is equivalent
         * to a breadth-first traversal of the child list without dups.
         */
I don't think it is a true breadth-first traversal, not in the way I
understand breadth-first, but it does ensure that parent nodes appear
before child nodes. So in reverse, child nodes will appear before
parent nodes. While this is not the same as a depth-first traversal
it may be OK. There may be some specific requirements of DF_1_INITFIRST
that need to be taken into account.

Nathanael

>
> > Index: libexec/ld.so/loader.c
> > ===================================================================
> > RCS file: /cvs/src/libexec/ld.so/loader.c,v
> > retrieving revision 1.177
> > diff -u -p -p -u -r1.177 loader.c
> > --- libexec/ld.so/loader.c    3 Dec 2018 05:29:56 -0000       1.177
> > +++ libexec/ld.so/loader.c    27 Apr 2019 13:24:02 -0000
> > @@ -749,15 +749,15 @@ _dl_call_init_recurse(elf_object_t *obje
> >  {
> >       struct dep_node *n;
> > 
> > -     object->status |= STAT_VISITED;
> > +     int visited_flag = initfirst ? STAT_VISITED_1 : STAT_VISITED_2;
> > +
> > +     object->status |= visited_flag;
> > 
> >       TAILQ_FOREACH(n, &object->child_list, next_sib) {
> > -             if (n->data->status & STAT_VISITED)
> > +             if (n->data->status & visited_flag)
> >                       continue;
> >               _dl_call_init_recurse(n->data, initfirst);
> >       }
> > -
> > -     object->status &= ~STAT_VISITED;
> > 
> >       if (object->status & STAT_INIT_DONE)
> >               return;
> > Index: libexec/ld.so/resolve.h
> > ===================================================================
> > RCS file: /cvs/src/libexec/ld.so/resolve.h,v
> > retrieving revision 1.90
> > diff -u -p -p -u -r1.90 resolve.h
> > --- libexec/ld.so/resolve.h   21 Apr 2019 04:11:42 -0000      1.90
> > +++ libexec/ld.so/resolve.h   27 Apr 2019 13:24:02 -0000
> > @@ -125,8 +125,9 @@ struct elf_object {
> >  #define      STAT_FINI_READY 0x10
> >  #define      STAT_UNLOADED   0x20
> >  #define      STAT_NODELETE   0x40
> > -#define      STAT_VISITED    0x80
> > +#define      STAT_VISITED_1  0x80
> >  #define      STAT_GNU_HASH   0x100
> > +#define      STAT_VISITED_2  0x200
> > 
> >       Elf_Phdr        *phdrp;
> >       int             phdrc;
> >

Reply via email to