> I was wrong, I forgot that tracehook_get_signal() doesn't need JCTL. Right, that is key.
> OK, let's look at utrace_do_stop: > > if (task_is_stopped(target) && > !(target->utrace_flags & UTRACE_EVENT(JCTL))) { > utrace->stopped = 1; > return true; > } > > This doesn't look correct. We don't hold ->siglock, the task can be > SIGCONT'ed and return from get_signal_to_deliver(), and then we set > ->stopped. Or I missed something again? I think you're right. The logic there was supposed to be, "TASK_STOPPED means it will get into utrace_get_signal()." That much is true, but nothing inside utrace_get_signal() actually synchronizes with this to make that matter. All this check does is try to optimize the TASK_STOPPED case not to take the siglock. That doesn't seem worth much, so we can just drop it. > Then we re-do this (well, almost) check under ->siglock, > > } else if (task_is_stopped(target)) { > if (!(target->utrace_flags & UTRACE_EVENT(JCTL))) > utrace->stopped = stopped = true; > } > > But this is not nice. Let's suppose the task is already stopped, we do > UTRACE_ATTACH + utrace_set_events(JCTL). This is exactly why utrace_set_events() sets ->stopped preemptively for that case. > Now, utrace_control(UTRACE_STOP) can do nothing until SIGCONT. We don't > even set ->report. Yes, we can't set ->stopped if JCTL, we can race with > utrace_report_jctl() which does REPORT(). Setting JCTL while in TASK_STOPPED made it set ->stopped, so utrace_control() succeeds without calling utrace_do_stop(). > BTW, afaics utrace_report_jctl() has another bug, > > REPORT(task, utrace, &report, UTRACE_EVENT(JCTL), > report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what); > > I think it should do > > REPORT(task, utrace, &report, UTRACE_EVENT(JCTL), > report_jctl, what, notify); > > instead. There is a bug, but your fix changes a key API choice. I've put in this fix: - report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what); + report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, + notify ? what : 0); There are two things a tracer might be tracking: state or events. The "state" is whether the thread is in job control stop or is running. The "events" are the SIGCHLD notifications that the thread tries to post to its parent. The @type argument shows the state we will be in after the callback. If the state changes, there will be another callback. That's what a state-tracking tracer needs, e.g. to keep a little light on the screen red while the thread is stopped and green while it's running. The @notify argument shows what SIGCHLD the parent sees (if it were dequeuing all possible SIGCHLD postings as quickly as they come). That's what an event-tracking tracer needs, e.g. to match up with what SIGCHLDs are expected in the parent. Your change to @type would break state-trackers in the case where tracehook_notify_jctl() is called from get_signal_to_deliver() with CLD_STOPPED. > With the first patch, we call utrace_report_jctl() before we actually > stop. do_signal_stop() can fail then, but I think this is OK, we can > pretend that SIGCONT/SIGKILL happened after we stopped. It is not complete, > and with this patch we always call ->report_jctl with notify == 0. Just for > discussion. I think I sort of understand the intent of your patch. If we change the calling convention for tracehook_notify_jctl, I think we want to preserve the aspect that the hook decides about sending the notification. That's how the ptrace quirks can be reimplemented differently later without changing the tracehook layer again. Also, we certainly don't want one tracehook call with two different locking conditions. It seems right in principle to do the reporting before we change ->state, given that we have to allow for it changing during the callbacks. And indeed, that avoids the JCTL special case mess entirely. Thanks, Roland