Re: Q: utrace_stop() && notification

Roland McGrath Thu, 16 Jul 2009 16:35:25 -0700

> If ->report_xxx() returns UTRACE_STOP the tracee will do utrace_stop
> eventually. But  how can the tracer know the tracee is already
> TASK_TRACED/->stopped?


We don't really have a way.  It is something we need to address better.
This is the most essential thing that made the old utrace-ptrace.patch a
temporary kludge rather than a good model.

> Looks like a horrible hack, imho. Utrace should know nothing about
> ptrace.

Agreed!

> Can't we do something like
[...]
>       +       // SPECIAL!!! must not block/etc
>       +       void (*notify_stopped)(...);

Part of the point about utrace is to make it easier not to foul up other
engines when your engine has a bug.  Many of the rules about well-behaved
callbacks have to be met by all engines or they will interfere; we can't
really avoid that reality when we call engines' code.

But part of the plan is not to make those rules too hard to meet.
That's why the "don't block (for long)" rule is such a loose constraint
in real kernel terms.  Having engine-writers commonly need to write code
that can't even call mutex_lock or kalloc, etc., is IMHO just too much
to ask.  It's far better if we don't have such tight constraints on any
callback that engine-writers will really need to use, if even if just
this special case.  (Because it really is not a rare thing--this is the
essential kind of synchronization that any engine-writer who handles
stopping at all will want.)

The report_* callback is the engine's opportunity to synchronize with
its asynchronous engine code (debugger threads, etc.).  The crux of the
issue is that when your "synchronizing" callback is done, there always
might be another engine's callback running afterwards.  Hence, the
actual effect of UTRACE_STOP causing TASK_TRACED state doesn't happen
until after the callback.  So your synchronization partner then really
needs a second synchronization with the thread itself really stopping.

Let's talk about the utrace API semantics picture for this purely from
the engine-writer's point of view for a bit before we get into the
implementation details inside utrace.

The pieces we have now are engines doing their own wakeups in callbacks,
utrace_barrier, and utrace_prepare_examine.

Currently utrace_barrier only makes guarantees about the ordering of
callbacks and the various asynchronous utrace_* calls, but not about
true thread stoppedness.  

utrace_prepare_examine verifies that the target is blocked already,
and synchronizes with low-level context switch details for that, but
does not do any synchronization/delay *before*/with target->state
entering its blocked/stopped state.  Note it does not in particular
require the target be stopped, though does require the caller's engine
used UTRACE_STOP.  This is intended for non-perturbing examination of
a thread that might be blocked in a syscall (a la task_current_syscall).
But as it stands, it would also try to accept a thread that is just
blocked in kalloc/mutex_lock in the next engine's callback.  But
unlike the syscall, that is likely to actually wake up again very soon
and that results in an -EAGAIN hiccup from utrace_prepare_examine or
utrace_finish_examine.

So in today's utrace, the best you can do is:

        report_* does:
                wake_up(you);
                return UTRACE_STOP;
        you wake up and do:
                utrace_barrier
                => your callback is finished, its UTRACE_STOP is logged
                utrace_prepare_examine
                => if ok, it's really stopped now
                |> if -EAGAIN, you have to loop(?)
                user_regset->get, etc. (aka arch_ptrace)
                utrace_finish_examine
                => if ok, it really was really stopped when i said it was
                |> if -AGAIN, you have to loop(?)

A first notion is to give utrace_barrier a stronger guarantee: when
your engine has UTRACE_STOP in force and then you call utrace_barrier,
it blocks until the target enters a truly stopped state.  But OTOH
that is *too* strong if what you want is to see ASAP what the thread
is doing, be it properly stoppable now or blocked in a syscall you
don't want to interrupt.

I guess maybe if you want to get a thing positively stopped as
distinct from maybe-just-blocked-in-syscall (but with UTRACE_STOP
preventing it from reaching userland), then you should just start out
by using utrace_control(,,UTRACE_INTERRUPT) (perhaps followed by
utrace_control(,,UTRACE_STOP) to get "already stopped" return value?)
and have your callback return UTRACE_STOP.

So then perhaps instead what makes sense is that utrace_prepare_examine
should roll in an actual synchronization waiting for target->state to be
blocked outside of any other utrace engine callbacks (after them all).
If you want to do a "non-blocking" asynchronous sample of a thread, then
it's safe to use this only after your engine callback has notified you
that UTRACE_STOP is in force or utrace_control(,,UTRACE_STOP) returned
zero.  Or perhaps we can make it so utrace_prepare_examine diagnoses the
"waiting for utrace to settle down" and the "real blocks" differently.
The point there is that a "non-blocking sample" doesn't want to have a
window where the debugger saw the target was starting to block in some
real kernel code (had changed ->state) but then the debugger winds up
blocking after ther target has woken again and then runs for a long time
in the kernel without a block.  (In that case, the debugger should see
"it's still running a lot, you will get your callback eventually".)

A related issue to keep in mind in all this is the "preemption flutter"
problem that someone cited with ptrace (it was biting UML folks, you may
recall).  The control flow in old ptrace is simple enough that we could
defuse that adequately just by omitting an explicit preemption check or
two so that voluntary preemption would not tend to occur between the
tracer wakeup and the actual schedule() blocking the tracee.  In the
utrace world, there will be far more going on and it would be unlikely
we could manage that approach for long.  Moreover, a core tenet of
utrace is to help tracing engines not clog the system up generally, so
IMHO we very much want extra preemption checks between callbacks and the
like (as we have now in finish_callback).  So we'd really like to
address that sort of problem with really proper synchronization between
what in the tracer really needs to wait for what in the tracee, rather
than more of the juggled-together-but-not-actually-tied dances going on now.

Working in the other direction, we have the (my) idea in utrace that no
particular operation should be any more synchronous than it has to be.
This motivates the approach of low-level moving parts the engine-writer
fits together, the only blocking synchronizations being explicit calls
like utrace_barrier, etc.

So, what does it all mean?  Well, I don't know exactly.  It means that
I feel strongly about retaining the flexibility in utrace to do things
in very asynchronous and opportunistic ways, and I want to keep the
core API focus on a fairly small number of calls.  At the same time we
need to do something that really makes good sense both for ptrace and
for other engines that will have a similarly synchronous sort of
control flow but different (and I hope less idiosyncratic) means to
synchronize with debugger calls.

I have a thought or two about ptrace in particular.  But perhaps first
we can explore the general case of the synchronization picture in the
API some.


Thanks,
Roland

Re: Q: utrace_stop() && notification

Reply via email to