reap checks

Roland McGrath Sun, 15 Mar 2009 19:34:28 -0700

> utrace_set_events:
> 
>       (utrace->death && ((old_flags & ~events) & DEATH_EVENTS))
> 
> "(old_flags & ~events) & DEATH_EVENTS)" means the caller tries to
> clear DEATH/QUIESCE. Why this is not allowed? And why this is not
> allowed _only_ when the target runs utrace_report_death()->REPORT()?


This is specifically documented for -EALREADY, and in the DocBook section
"Interlock with final callbacks".  The idea is this:

For most utrace events, you don't know whether you'll get some callbacks.
It could be, the task got SIGKILL first thing after you attached, and it
will never report anything.  That is fine for the most part.  But for the
lifetime events it becomes a real burden on the users of the API.  They
have to manage their data structures, and so they have to know reliably
when they can and can't get what callbacks.

So, the utrace_set_events rules try to ensure that the caller knows for
sure whether it will or won't get a callback when the task dies and/or is
reaped.  You can clear DEATH/QUIESCE, and be sure from the return value
that it is now impossible that there is a report_death/report_quiesce
callback racing with you because the guy just got a SIGKILL.  If you can't
be sure of that, then you do know for sure that your callback is being made
right now or very soon.

> I think this line can be just killed. I guess the intent was to
> prevent utrace_release_task() from doing utrace_reap() in parallel
> with utrace_report_death(), but note that utrace_set_events() can
> never "shrinks" ->utrace_flags, it only sets new bits.

It's not ->utrace_flags that matters here, it's engine->flags.

That is one of the intents, but not the only one.  It's just as important
that the user of the API can rely on the ordering of its callbacks wrt its
utrace_set_events/utrace_control calls as that it can rely on the ordering
of its death and reap callbacks.

> The next line looks strange too, don't we need
> 
>       (utrace->reap && ((events & ~old_flags) & UTRACE_EVENT(REAP)))
> 
> ?

get_utrace_lock() already returned -ESRCH if it was in EXIT_DEAD, so this
is probably moot.

> And I don't understand why do we need utrace->death at all. Apart from
> utrace_set_events (which I think doesn't need it), it is only used by
> utrace_control(UTRACE_DETACH). But I can't see how can we race with
> utrace_report_death(). If it can be called, we have DEATH_EVENTS bits
> set. But in that case utrace_do_stop() can't succeed, so UTRACE_DETACH
> can only do mark_engine_wants_stop() but not utrace_reset().

It is used by utrace_set_events and utrace_control for the same purpose.
Those calls must know for sure that report_death cannot happen, or else
that it will (or it's already happening).

Many tracers only keep track until death.  For them, the simple thing is to
have report_death clean up their data structures and return UTRACE_DETACH.
But then they also want to do asynchronous detach.  So they can do
utrace_set_events or utrace_control as the synchronizing step of
asynchronous tear-down.  If it returns 0, then report_death will not and it
is safe to destroy data structures the callback code would use.  If it
returns -EALREADY, then report_death will shortly be called and we can rely
on our callback code to take care of the data structures before it returns
UTRACE_DETACH.


Thanks,
Roland

Re: utrace_set_events/utrace_control && death/reap checks

Reply via email to