> > utrace_control(current,,UTRACE_STOP) doesn't really do anything more.
> 
> It does utrace_do_stop() which sets UTRACE_REPORT/TIF_NOTIFY_RESUME.
> Harmless, but makes sense to avoid.

Right.  I was talking about the API perspective.  Those internal details
are more or less invisible to a given engine.

> OTOH, I am wondering if we can give more power to utrace_control(STOP).
> Currently debugger can't stop the tracee unless it cooperates. However, if
> the tracee doesn't have UTRACE_EVENT(QUIESCE), then utrace_control(STOP)
> works. This is strange.

Why is it strange?  If you don't have any callback, then all your engine
does is get stopped.  If you have callbacks, then you always get callbacks
before the thread does anything (including stop).  Since your callback's
return value always replaces any other resume action for your engine, you
have to "cooperate" with yourself in not changing your mind the next time
your code is in charge of the tracee.

> Stupid question: what if we just remove the clear_engine_wants_stop()
> code in finish_callback_report() ?

That would change the API so that a callback's resume action is always
overridden by a utrace_control(,,UTRACE_STOP).  Today the practice from a
third-party debugger thread is to do:

        ret = utrace_control(task, engine, UTRACE_STOP);
        if (ret == 0) {
          // Already stopped, e.g. was in jctl or another engine's stop.
          // This means we can do our asynchronous examination here.
        } else if (ret == -EINPROGRESS) {
          // it will get to report_quiesce soonish (modulo blocked in kernel)
        }

If the examination you wanted to do was some predetermined sequence of
fetching and fiddling before resuming, then the report_* callback can just
do all that itself.  For example, fetch the registers for a sample and keep
going; insert some breakpoints and keep going; etc.

Today, the callback that goes with that asynchronous-side code has the
choice to do something and stop or to do something and not stop.  With this
change, it would have to explicitly request not stopping by calling
utrace_control(current, engine, UTRACE_RESUME) inside the callback.  It
would need to use its own bookkeeping above the utrace layer to know that
there was a stop it was supposed to clear (i.e. something the code sample
above would set before calling utrace_control).

This changes the set of races with asynchronous stop/resume scenarios.
But it doesn't remove the asynchronous-resume vs callback-stop race.
In concert with wake_up_tracer-after-stop perhaps it gets better.
We'd have to really think through the races with the new API.

> > So this would fix that race.
> > ...
> > (But note if there is
> > another engine that decides to stop anyway, then we'll have used a
> > second CPU and/or ping-pong scheduled to run the tracer thread in
> > parallel with a very short amount of work in the tracee before it just
> > blocks anyway.)
> 
> Hmm. Not sure I understand... Yes, the tracee still can stop if another
> tracer demands, but this is fine?

Correct.  When that is what's going to happen (e.g. there is another engine
attached that is a slow stopper like ptrace), then it's suboptimal to have
the scheduler decide to yield the CPU immediately on waking your first
tracer.

> Btw, before I forgot about this. shouldn't utrace_control(UTRACE_RESUME)
> remove ENGINE_STOP from ->utrace_flags like DETACH does?

Perhaps so.  My memory is dim on the use of that bit in utrace_flags.  It's
to close certain races, but we'll have to remind each other of the details.

> Heh. it turns out that I misunderstood you in the past, I thought
> you dislike the whole idea of wake_up from utrace_stop().

No, just the over-general hook or the over-specific ptrace hack.
I talked about about a utrace_wake_up() that would work this way.

> Hmm. Not sure... at least I don't immediately see something simple.
> 
> Except, just add wait_queue_head into struct utrace. This way utrace_barrier()
> can wait longer than needed if it notices utrace->reporting == engine. Or we
> can add it into utrace_engine, in this case utrace_stop() needs
> list_for_each(engine, utrace->attached).

You're talking about two different things here.  Whatever implementation
details change in utrace_barrier() is not directly apropos.  That would not
change the API-visible details at all, just make utrace_barrier() perform
better.  Let's not get into that before we hash out the new API ideas.

For the new API idea, I was talking about something like a utrace_wake_up()
call.  That might be implemented by a wait_queue_head_t * in utrace_engine,
for example.  The implementation internals could involve a simple
iteration, or involve more hidden bookkeeping.  I'm only talking about the
API idea for the moment.  I only mentioned implementation details at all to
indicate that a restricted API (one wake-up per callback per engine) could
be done with constant storage overhead and thus avoid a whole potential can
of worms--which is the only real reason to restrict the API like that.
So all that's just to say that an API idea along those lines is feasible.

What I'm most interested in right now is thinking through the utility of
particular new API details for engine writers.  Between an imagined clean
ptrace implementation, an imagined clean abstract replacement for that
usage model, the gdbstub model (which for these purposes is probably
largely equivalent to the putative abstract replacement), and torokalpar's
inverse example, we may have good coverage of the kinds of needs that the
API should address.


Thanks,
Roland

Reply via email to