> Can't comment right now, need to read the code. Such gentlemanly restraint.
> Afaics, we can't just remove utrace_finish_jctl() and the similar code in > utrace_stop(). We need > > void utrace_finish_jctl(void) > { > struct utrace *utrace = task_utrace_struct(current); > /* > * While in TASK_STOPPED, we can be considered safely stopped by > * utrace_do_stop(). Make sure we can do nothing until the > tracer > * drops utrace->lock > */ > if (unlikely(__fatal_signal_pending())) > spin_unlock_wait(utrace->lock); > } > > and utrace_stop() should do the same. I see. The comments should be more clear that SIGKILL breaking TASK_TRACED is the only way this can arise. In the jctl case, that is in particular after utrace_do_stop() changed TASK_STOPPED into TASK_TRACED. > Otherwise, the killed tracee can start another reporting loop and > list_for_each() can race with, say, utrace_reset(DETACH)->utrace_reset(). > More generally, if the tracer sees "it is stopped" under utrace->lock, > the tracee must be "really" stopped until we drop utrace->lock(), it > must not "escape" from utrace_stop() or do_signal_stop(). Correct. So today ->stopped really does still mean one other thing. It means that it's either still in TASK_TRACED or is just waking up for SIGKILL. I think we've discussed the related stuff before. A contrary approach would be to ensure that in the SIGKILL-breaks-UTRACE_STOP case we never make any more normal reports at all before utrace_report_death. It seems like a good API choice on its face: if you used UTRACE_STOP, you didn't expect to see any SYSCALL_EXIT, SIGNAL, etc. events--the special case is the instant-death scenario, so straight to report_death makes some sense. I was attracted by this until I started going through it. It's all good until you consider report_clone. If a SIGKILL is pending when you get to utrace_report_clone(), the clone has already happened. If you skip that callback, the engine doesn't find out about the new child, and that matters if it's not a CLONE_THREAD. Thanks, Roland