> Can't comment right now, need to read the code.
Such gentlemanly restraint.
> Afaics, we can't just remove utrace_finish_jctl() and the similar code in
> utrace_stop(). We need
>
> void utrace_finish_jctl(void)
> {
> struct utrace *utrace = task_utrace_struct(current);
> /*
> * While in TASK_STOPPED, we can be considered safely stopped by
> * utrace_do_stop(). Make sure we can do nothing until the
> tracer
> * drops utrace->lock
> */
> if (unlikely(__fatal_signal_pending()))
> spin_unlock_wait(utrace->lock);
> }
>
> and utrace_stop() should do the same.
I see. The comments should be more clear that SIGKILL breaking
TASK_TRACED is the only way this can arise. In the jctl case, that is
in particular after utrace_do_stop() changed TASK_STOPPED into TASK_TRACED.
> Otherwise, the killed tracee can start another reporting loop and
> list_for_each() can race with, say, utrace_reset(DETACH)->utrace_reset().
> More generally, if the tracer sees "it is stopped" under utrace->lock,
> the tracee must be "really" stopped until we drop utrace->lock(), it
> must not "escape" from utrace_stop() or do_signal_stop().
Correct.
So today ->stopped really does still mean one other thing. It means
that it's either still in TASK_TRACED or is just waking up for SIGKILL.
I think we've discussed the related stuff before. A contrary approach
would be to ensure that in the SIGKILL-breaks-UTRACE_STOP case we never
make any more normal reports at all before utrace_report_death. It
seems like a good API choice on its face: if you used UTRACE_STOP, you
didn't expect to see any SYSCALL_EXIT, SIGNAL, etc. events--the special
case is the instant-death scenario, so straight to report_death makes
some sense. I was attracted by this until I started going through it.
It's all good until you consider report_clone. If a SIGKILL is pending
when you get to utrace_report_clone(), the clone has already happened.
If you skip that callback, the engine doesn't find out about the new
child, and that matters if it's not a CLONE_THREAD.
Thanks,
Roland