After returning from various vacations, and rummaging through
my mail, I found this from Alan, which doesn't seem to have
found its way to the alias:

Alan Coopersmith wrote:
> This interesting article on work being done by an OpenOffice
> developer to speed up the Linux linker/loader was recently sent
> to me:
> 
>     http://lwn.net/Articles/192082/
> 
> Clearly, Solaris ld already has the -Bdirect they added to
> GNU ld, but would any of their other optimization ideas be
> useful for Solaris to adopt?   (If they can take -Bdirect
> from us, why not take good ideas from them if they would
> work for us?)

There's some good reading in the document, and I thought I'd
comment on some things we do and don't do.

Hash/Symbol/String Table:

We've done a fair bit of messing about in this area, and we've
tried doing the right thing by default rather than needing to
supply any options (as if ld(1) needed any more options :-).

We compute the hash bucket size based on the number of symbols
that have been input to the link-edit, our goal is a compromise
between the hash size and the number of symbol chains.  In general
this seems to work quite well.  Use elfdump(1) -h to view the
chains.

We also sort the symbols using their hash value.  Hence symbols are
laid out in chain order, and, because we write the symbol names
out as we write the symbol table entries, the string names follow
the hash order too.  Well, almost.  We also have a string table
optimization that compacts strings, ie. 'write' uses the same
string (+1) as '_write'.  Because of this, the ordering isn't
perfect:

  hash table:

      bucket  symndx      name
          1  [1]         _tfind
          2  [2]         _rwlock_destroy
          4  [3]         _atomic_cas_ptr
        ...
       1010  [1008]      _pthread_rwlock_destroy

  string table:

  0014308  \0   _   t   f   i   n   d  \0   _   p   t   h   r   e   a   d
  0014318   _   r   w   l   o   c   k   _   d   e   s   t   r   o   y  \0
  0014328   _   a   t   o   m   i   c   _   c   a   s   _   p   t   r  \0

Undefined symbols get included in the hash table, as they can
be used for resolving function addresses.  But, this is only
used to resolve to the .plt within an executable.  So, there's
probably some UNDEF symbol optimizations we could still do.


Direct Bindings:

We've been playing with this for some time, and it's still my hope
to enable this for most of the core OS.  But interposition has been
with us for a long time now, and direct binding can break interposition.
Sometimes that's what you want - there are customers who want to
bind two different objects to two different definitions of foo().
But, there's a *lot* of things that work because of interposition -
and note, I didn't say by design!  Not that you can't get into
multiple symbol issues without direct bindings:

  http://blogs.sun.com/roller/page/rie?entry=c_dynamic_linking_symbol_visibility

We've been working on the corner cases, ways in which to signal that
direct binding isn't applicable, and better ways of observing what
bindings are occurring within a process.  I also have to resurrect
some discussions with the compiler folks, as C++ puts out numerous
symbols that only work because of interposition.  It would be nice if
ld(1) could know what these were up front.  Perhaps just treating
COMDAT/SHT_GROUP symbols as non-direct candidates (vague?) would
be sufficient, I don't know.

Documenting how to effectively use direct bindings is also something
that must be covered, although the link-editors themselves and much
of our gnome system already have used this feature.


Hash Values:

We've had a couple of projects start to look at a better hashing
mechanism, either storing the value, or using an alternative hash
(while continuing to supply the old hash), but alas the developers
either leave or move onto new projects.


Conclusion:

We've obviously got a few places where optimizations could be
applicable.  But I will note that we've spent many hours working
on optimizations only to find out they have little affect in the
real world.  Have you looked at how much ".init" code get
executed before a process reaches "main" :-).  Jez.  There's always
a compromise between complex functionality and the performance
benefit it has.

One reflection is that the biggest wins we've found are around the
use of versioning/scoping an object.  When you define your interfaces,
a number of unnecessary global symbols can be demoted to locals.
This cuts down on hash/symtab/strtab size, and greatly reduces
runtime relocation costs.  The best way to speed up ld.so.1 is to
not ask it to do unnecessary work for you in the first place :-)

Although this versioning/scoping has been accomplished for some
time now within C using mapfiles, C++ (which could benefit the
most) has been a little hard to handle.  New compiler options
will hopefully help our customers in this regard:

  
http://blogs.sun.com/roller/page/rie?entry=interface_creation_using_the_compilers


-- 
Rod

Reply via email to