> I think we should start with changing utrace_control(DETACH) anyway,
> then try to improve. I'll ressend the one-liner I already showed.

Ok.

> Hmm. I'll try to think more. Right now I don't really understand
> how to do this correctly.

I wasn't immediately sure either.

> OK, finish_callback_report() and utrace_control(DETACH) can set
> TIF_NOTIFY_RESUME. 

Right.  Those utrace_resume has the report.action==UTRACE_RESUME bail-out
case.  So either that would change or detach would also do UTRACE_REPORT.

> But what if there are no more attached engines?
> Looks like, utrace_resume(UTRACE_RESUME) needs to handle this special
> case. And utrace_reset() shouldn't clear task->utrace_flags, otherwise
> utrace_resume/utrace_get_signal won't be called.

Right.  Or else tracehook_notify_resume could call utrace_resume
unconditionally, but I'm not at all sure that is not worse.  The
original theory was that it should always be OK to have some
utrace_flags bits stay set when they are "stale", because any kind of
reporting pass that got enabled would hit the report->spurious case
and clean the state up synchronously when it's safe.

> So, probably detach should set TIF_NOTIFY_RESUME, but utrace_reset()
> should do user_disable_single_step() too if no more engines. Confused.

If there are no more engines but the tracee is still running, we still
shouldn't do it there because it still might not be entirely safe.
If the tracee is not stopped, it's only safe to call in current.

> And in fact I don't understand why this is important. When it comes
> to multiracing, any engine can hit the unwanted/unexpected trap
> because another engine can ask for UTRACE_*STEP. 

Right.  An engine earlier in the list could swallow the signal so the
next engines' callbacks didn't see it.  But it doesn't know that some
later engines didn't also ask for stepping.  So there would have to be
some understood convention between engines.  For example, a later
engine could see info->si_signo==SIGTRAP et al and act on that even
when the incoming utrace_signal_action(action)==UTRACE_SIGNAL_IGN.
Of course, that doesn't help a non-stepping engine that is earlier
in the list to know that a later engine will be swallowing the signal.

The original theory on this was that we'd one day stop overloading
user signals for debugger-induced traps.  In some past TODO lists and
postings I referred to "extension events".  The idea (in part) was
that things like hardware stepping would generate a special new flavor
of utrace event rather than a real signal that has to be intercepted.
Then engines' callbacks would easily see that this was a debugging
event induced by some engine and ignore it (or more likely, just never
get any callback unless your engine registered interest in stepping).
This would also address the case of asynchronous engine detach just
after a trap has actually hit, where today the SIGTRAP is queued and
then later won't be intercepted by a debugger, and instead kills the
user process.

But we don't have any of that now, and don't yet know if we will
really pursue any big improvements at this API level.

> The only really
> important (I think) case is when the last engine detaches.

That's the most important case, sure.  But in any case that is not
actually racy, we should avoid later spurious traps.

> IOW. Suppose that eninge E does utrace_control(STEP) or its callback
> returns UTRACE_*STEP. If we do not detach this engine, other engines
> will see the trap. 

That's only so if the tracee actually gets back to user mode before we
have another reporting pass.

> So why it is so important to clear X86_EFLAGS_TF if we detach E ?

Perhaps I am worrying too much about it.  The worst thing is if it
could really get "stuck".  But that shouldn't be possible if there are
any engines at all, perhaps only any with UTRACE_EVENT(QUIESCE) set.
Worst case, one spurious SIGTRAP will get to a report_signal pass, but
nobody will return UTRACE_*STEP again, so there won't be another.  (Of
course, nobody will swallow that SIGTRAP and so it will terminate the
process first anyway, but that's another problem.)  

What seems important is any "non-racy" scenarios.  That is, where
perhaps it wasn't properly stopped and E detached "asynchronously",
but in practice the tracee was known to be otherwise blocked elsewhere
or something, so the detach should have full effect before it returns
to user mode.  But that is just vague theory off hand.


Thanks,
Roland

Reply via email to