Alan Coopersmith wrote:
> One of the new features in the upcoming Xorg 6.9 release is logging
> a stack trace to the Xorg log when the X server coredumps.

If the server is core dumping, and you allow the core to be created,
why are you doing all this extra work?

> But pstack on the core dump generated found many more symbols:

As it should.

You're not the first to ask about dladdr(), see 4934427, but I haven't
been very motivated to fix it.

An ELF image contains two symbol tables.  The .symtab is part of the
disc image, it isn't mapped as part of the memory image.  It contains
all the symbols (local and global) that are part of the built image.
It can be removed with strip(1).

The .dynsym is a subset of the .symtab, and is maintained in the text
segment so that it gets mapped with the image.  It is this table that
ld.so.1 looks at for all runtime binding requirements.  By default
the .dynsym only contains global symbols, as these are all external
objects can reference.  Under versioning/scoping, this table can
be further reduced to only provide those globals that define the
objects interface.  This table can't be stripped.

Defining an interface reduces the runtime overhead of binding objects,
we all know this.  But it has some secondary effects.  The .dynsym,
the associated string table (.dynstr), and the associated hash table
(.hash) are all reduced.   Often substantially.  This reduces the size
of the text segment, and can reduce paging, and symbol lookup costs.

Thus keeping the .dynsym to a minimum has been a goal.

To provide additional symbols to dladdr() we'd either have to i)
remap the object to inspect the .symtab (if it hasn't been stripped).
Given that your usage is under a fatal error condition, undergoing
excessive processing in this state might be inadvisable.  Or ii)
add additional symbols to the .dynsym - perhaps all global and
local functions, as these are all dladdr() needs.   But this will
start to enlarge the text segment again.  Think C++ here (not
that they're a great user of scoping - although they would benefit
the most).

I (and the debugging folks) have come across a number of applications
who try and print stack traces on a fatal error, and then prevent any
core from being produced at all.  Historic issues like the core being
too large to leave in the cwd, or the core not being debuggable in
different system environments, have now changed.  coreadm(1) exists, and
core files can contain every segment of the original image.

We want to encourage the creation of core dumps for fatal errors.
This gives the true means of addressing the problem.

Providing technology that allows historic alternatives to live
on, such as printing a stack frame and existing, isn't felt to be
a wise choice.   Also, by all accounts you can get at all symbols
via libproc (although the interface isn't defined stable), so
enlarging ELF images, or adding remapping to ld.so.1 isn't
even necessary for the 0.01% of the world that wants to do this.

I'd be willing to entertain arguments for extending dladdr() -
add them to the bug report please - but at present I'm not convinced
that it should be extended.  Plus, I'm not convinced stack traces
from exiting processing have value either.  On a fatal error condition,
tell the use that the bounce has gone out of their bungee and point
them at the core file.


-- 
Rod

Reply via email to