Ken Cox wrote:
> I appreciate the offer to look at the objects; how can I get them to you?
Having the objects in house, I did some more poking around. I thought
I'd summarize some other findings as they may be of benefit to others.
Check out the Linker and Libraries manual for some suggestions on reducing
overhead. I usually point folks at our cheat sheet, which gives some
recommendations for building various objects.
http://docs.sun.com/app/docs/doc/817-1984/6mhm7pl2p?a=view
More details related to shared objects can be found in "Performance
Considerations":
http://docs.sun.com/app/docs/doc/817-1984/6mhm7pl1s?a=view
As to some specifics:
-Bsymbolic
----------
The C++ folks really frown on this as it can compromise some of their
implementation details. I can see why you used it. libISV.so
contains 83873 relocations that must be processed at runtime, which
is taxing ld.so.1(1). Without -Bsymbolic most of these would be
symbolic relocations (requiring a symbol lookup) rather than
relative relocations (require a simple addition), and this would be
*very* expensive.
But you could achieve the same by defining the interfaces you really
want to export from your library. See:
http://blogs.sun.com/roller/page/rie/?anchor=interface_creation_using_the_compilers
By defining your interface, all other global symbols get demoted to
locals. This results in these locals being bound internally (which is
what -Bsymbolic does). It also greatly reduces the size of the
symbol table (.dynsym), string table (.dynstr) and hash table (.hash),
which will significantly reduce the size of the text segment, reduce
paging, and reduce symbol lookup overhead within the running process.
Unused materials
----------------
Set the environment variable LD_OPTIONS=-Dunused,details. You'll see that
a number of dependencies are never used:
debug: file=liblber.so.2 unused: does not satisfy any references
debug: file=librpcsvc.so.1 unused: does not satisfy any references
debug: file=libresolv.so.2 unused: does not satisfy any references
debug: file=libintl.so.1 unused: does not satisfy any references
debug: file=libw.so.1 unused: does not satisfy any references
debug: file=libXext.so.0 unused: does not satisfy any references
debug: file=libX11.so.4 unused: does not satisfy any references
and a number of sections are never used:
debug: section=.rodata; size=0x1d; input from file=./lib/libssl.a(s2_meth.o);
unused: does not satisfy any references
debug: section=.rodata; size=0xc7; input from file=./lib/libssl.a(s2_srvr.o);
unused: does not satisfy any references
......
Probably not a huge overhead, but every little counts.
I suggest that the final image is "ldd -rU" clean. See:
http://blogs.sun.com/roller/page/rie/?anchor=tt_dependencies_tt_define_what
It all helps reduce runtime overhead.
Strings
-------
I always enjoy looking at the strings in objects:
% strings -10 libISV.so | sort | uniq -c | sort -rn | head
592 Report.C
485 input params:
452 Object.C
387 ObjectExport.C
371 trace.log
355 Relationship.C
321 Node.C
...
Again, it all adds up (are these perhaps part of an ASSERT macro?), and I
wouldn't be surprised if every string accounts for a relocation.
Large data segments can be costly too. You can inspect an individual section
with elfdump:
% elfdump -N.data -w data libISV.so
% strings -10 data | sort | uniq -c | sort -rn | pg
......
1 $Id: xmlTags.C 1.1.1.1 Wed May 25 11:44:27 2005 ken Experimental $
1 $Id: workspace.C 1.1.1.2 Tue Aug 16 12:41:07 2005 ken Experimental $
......
Hmmm, scs strings in the data section? #pragma ident declarations would
put these in the .comment section, which isn't part of the memory image.
Comments can ever be stripped from the file image using mcs(1). Having these
in the .data section contributes to data fragmentation and possibly additional
paging activity.
---------
After forwarding this information to Ken, and asking his permission to post,
he asked "It would be exceedingly cool if I could estimate a priori the
performance win of these changes before I went and implemented them all".
That's the hard part. In my experience, it is a lot of little things
that finally add up. But each individual change can be hard to measure.
Out of the many performance tuning efforts we've employed with the Solaris
libraries, interface definitions (and hence symbol scoping of unnecessary
globals to locals) was our biggest win. But even after making these changes,
looking at one app you might not be able to measure any improvement. It was
the total system performance, under large loads or low memory conditions that
would show the wins.
Bottom line, the less work you give the runtime linker (relocations,
dependencies), and the smaller the image (less page faults) the better off you
will be. If you can observe an improvement to either of these by using static
analysis such as elfdump(1), size(1), it will pay off in the end.
Yeah, it can be kinda hard to sell this sort of effort to management :-)
--
Rod.