> Yeah.  I think one concern was that when a breakpoint is hit, a utrace
> client other than uprobes might see the SIGTRAP first and take some
> unwarranted action because it thinks that the task is going to die.

There are two components of this issue.  The issue is that tracing
activities induce signals that would not ordinarily be raised in the
program.  Inserted breakpoints do this; so does single-step; so do hw
breakpoint features; etc.

First is the problem of interaction with signals generally.  This is
already a problem in vanilla ptrace.  Induced signals interact with the
normal functioning of the system, not just with competing tracers.  A
straightforward example is when the debugger suddenly dies.  If it had
been using single-step or suchlike, there may be a pending SIGTRAP that
was already posted before the debugger exited and detached, but had not
yet been processed by the debugger.  This is easy to achieve with a
wedged debugger, for example: it hasn't called wait4 to eat the signal
yet, and then it gets killed and never does.  Now the thread that was
formerly traced resumes running, and delivers the SIGTRAP and dumps
core.  Some kernel behavior tweaks to avoid that particular scenario are
easy to come up with.  But that is just the simplest example of the
issue.  More subtle bad effects range from perturbing the signal queue
resource accounting to interfering with the program's own use of blocked
and pending signals.  (Today it's effectively impossible to do some
normal debugging activities with a program that itself catches and
generates SIGTRAP.)

In summary, the first component is distinguishing tracing events from
real signals.  The second problem is engine interaction per se.

My view is that the crux of the former problem is the whole notion of
overloading the signals mechanism for reporting tracing-induced events.
Signals have one essential characteristic we need: they can be
queued/posted quickly and safely at interrupt level, to take effect only
at the safe, unentangled place just before returning to user mode.  My
thinking is to have a mechanism with this property separate from signals.
The "extension events" item on the TODO list is such a mechanism.  That
feature idea is intended to serve a variety of needs.  Among them is the
idea to use these for machine-level events that are being traced.  So
rather than catching signals, an engine inducing traps would register its
interest in the "single-stepped" event or the "breakpoint insn" event.
In cases like the trap for a breakpoint instruction, there is nothing at
the lowest level to distinguish a tracing-induced breakpoint from a
user's other use of those instructions (intended to generate a SIGTRAP).
So this event would be reported to interested tracing engines, and if no
engine consumes it, then it can become a signal.  I think the interface
can be devised such that a detaching engine always gets a callback for
any last events, so it's harder to write an engine that forgets to
swallow the last event it induced and leaves someone to die by SIGTRAP.

The mechanism to support that is a little way off.  But this is how I see
that going.  In that context, engine interaction seems a lot simpler.
Engines that induce low-level events have to sort those out among each
other (probably easy from PC values and such).  Engines interested in
fatal signals only ever see "natural" signals.  Those engines have to
work out among themselves who eats the signal last.  The report_resume
interface and simple priorities for callback order seem like they might
suffice.


Thanks,
Roland

Reply via email to