After returning from various vacations, and rummaging through my mail, I found this from Alan, which doesn't seem to have found its way to the alias:
Alan Coopersmith wrote: > This interesting article on work being done by an OpenOffice > developer to speed up the Linux linker/loader was recently sent > to me: > > http://lwn.net/Articles/192082/ > > Clearly, Solaris ld already has the -Bdirect they added to > GNU ld, but would any of their other optimization ideas be > useful for Solaris to adopt? (If they can take -Bdirect > from us, why not take good ideas from them if they would > work for us?) There's some good reading in the document, and I thought I'd comment on some things we do and don't do. Hash/Symbol/String Table: We've done a fair bit of messing about in this area, and we've tried doing the right thing by default rather than needing to supply any options (as if ld(1) needed any more options :-). We compute the hash bucket size based on the number of symbols that have been input to the link-edit, our goal is a compromise between the hash size and the number of symbol chains. In general this seems to work quite well. Use elfdump(1) -h to view the chains. We also sort the symbols using their hash value. Hence symbols are laid out in chain order, and, because we write the symbol names out as we write the symbol table entries, the string names follow the hash order too. Well, almost. We also have a string table optimization that compacts strings, ie. 'write' uses the same string (+1) as '_write'. Because of this, the ordering isn't perfect: hash table: bucket symndx name 1 [1] _tfind 2 [2] _rwlock_destroy 4 [3] _atomic_cas_ptr ... 1010 [1008] _pthread_rwlock_destroy string table: 0014308 \0 _ t f i n d \0 _ p t h r e a d 0014318 _ r w l o c k _ d e s t r o y \0 0014328 _ a t o m i c _ c a s _ p t r \0 Undefined symbols get included in the hash table, as they can be used for resolving function addresses. But, this is only used to resolve to the .plt within an executable. So, there's probably some UNDEF symbol optimizations we could still do. Direct Bindings: We've been playing with this for some time, and it's still my hope to enable this for most of the core OS. But interposition has been with us for a long time now, and direct binding can break interposition. Sometimes that's what you want - there are customers who want to bind two different objects to two different definitions of foo(). But, there's a *lot* of things that work because of interposition - and note, I didn't say by design! Not that you can't get into multiple symbol issues without direct bindings: http://blogs.sun.com/roller/page/rie?entry=c_dynamic_linking_symbol_visibility We've been working on the corner cases, ways in which to signal that direct binding isn't applicable, and better ways of observing what bindings are occurring within a process. I also have to resurrect some discussions with the compiler folks, as C++ puts out numerous symbols that only work because of interposition. It would be nice if ld(1) could know what these were up front. Perhaps just treating COMDAT/SHT_GROUP symbols as non-direct candidates (vague?) would be sufficient, I don't know. Documenting how to effectively use direct bindings is also something that must be covered, although the link-editors themselves and much of our gnome system already have used this feature. Hash Values: We've had a couple of projects start to look at a better hashing mechanism, either storing the value, or using an alternative hash (while continuing to supply the old hash), but alas the developers either leave or move onto new projects. Conclusion: We've obviously got a few places where optimizations could be applicable. But I will note that we've spent many hours working on optimizations only to find out they have little affect in the real world. Have you looked at how much ".init" code get executed before a process reaches "main" :-). Jez. There's always a compromise between complex functionality and the performance benefit it has. One reflection is that the biggest wins we've found are around the use of versioning/scoping an object. When you define your interfaces, a number of unnecessary global symbols can be demoted to locals. This cuts down on hash/symtab/strtab size, and greatly reduces runtime relocation costs. The best way to speed up ld.so.1 is to not ask it to do unnecessary work for you in the first place :-) Although this versioning/scoping has been accomplished for some time now within C using mapfiles, C++ (which could benefit the most) has been a little hard to handle. New compiler options will hopefully help our customers in this regard: http://blogs.sun.com/roller/page/rie?entry=interface_creation_using_the_compilers -- Rod
