I'll point out that the only symptom that matters to ptrace is ->exit_code
and that issue disappears if you stop overloading it for ptrace purposes,
which is the clean thing to do in the long run anyway.  But there are
subtler issues that haven't yet directly affected ptrace.

In this scenario, the event has already happened and that's externally
visible (the child exists).  (In the equivalent case for exec, you can
tell from /proc/pid/exe, the effects of setuid have happened, etc.)
We don't have the option of saying that the group-stop happened first.
If that were so, this event would not have happened yet, but it's
already been both reported via utrace and become observable by
userland.  So it should appear as if the other thread started its
group-stop after we were already in TASK_TRACED.

There is no need or benefit I see in any substantive change to the
utrace API reflecting this.  It's just that group-stop is an event
that can happen after other events, like syscall-exit is.

In other versions of this race, we'll already be in utrace_stop() or
on the way to it, and hit "we must participate in the bookkeeping."
But the bookkeeping is not really enough.  In the case where we were
the last thread we do set SIGNAL_STOP_STOPPED, but we swallow the
parent's notification.  Of course you don't notice this in ptrace
since it swallows that notification anyway (and since this is a rare
race).  But for the general case of utrace, it's wrong as it is.

So utrace_stop() ought to do the parent notification in this case.
That is, the whole middle part of the do_signal_stop() and
tracehook_notify_jctl() logic.  i.e., utrace_report_jctl() and then
both set ->exit_code and call do_notify_parent_cldstop(), modulo
ptrace interference.  Hmm, perhaps ->exit_code is only ever set now
because of ptrace, at least since your task_stopped_code() changes?
Perhaps we should just nix setting it in do_signal_stop() too (after
new ptrace).

Now, back to your case, a different ordering of that same race.

There is nothing really wrong with having utrace's hook be before the
group_stop_count check in get_signal_to_deliver().  But that would
require changing the tracehook interface there, and the existing
tracehook_get_signal() seems about as clean as we can make it.

It seems fine to me that we get into do_signal_stop().  If we got into
utrace_get_signal()->utrace_stop() first, it would just be the case
above where we are acting like do_signal_stop() ran anyway.  We can
change tracehook_notify_jctl() to call utrace_report_jctl() for any
nonzero utrace_flags, and check UTRACE_EVENT(JCTL) inside.  (Really it
could check (UTRACE_EVENT(JCTL) | ENGINE_STOP), but that's an internal
bit that only utrace.c itself knows about.)  If (after the report_jctl
pass if any) anybody wants UTRACE_STOP, then we handle it with
utrace_stop() right there (for what==CLD_STOPPED only).  That handles
the pending group_stop_count by the (fixed) logic above, so that when
we come back out from TASK_TRACED (assuming no unrelated new stops)
group_stop_count==0 and do_signal_stop() falls through its no-op case.
For that plan, we have to give utrace_report_jctl() a return value
that tracehook_notify_jctl() passes back so it can return 0 when
utrace_stop() ran (and did any notification itself).  But I think that
flies, is perhaps moderately clean, and does not need any changes
outside tracehook/utrace.

What do you think?


Thanks,
Roland

Reply via email to