> I was wrong, I forgot that tracehook_get_signal() doesn't need JCTL.

Right, that is key.

> OK, let's look at utrace_do_stop:
> 
>       if (task_is_stopped(target) &&
>           !(target->utrace_flags & UTRACE_EVENT(JCTL))) {
>               utrace->stopped = 1;
>               return true;
>       }
> 
> This doesn't look correct. We don't hold ->siglock, the task can be
> SIGCONT'ed and return from get_signal_to_deliver(), and then we set
> ->stopped. Or I missed something again?

I think you're right.  The logic there was supposed to be, "TASK_STOPPED
means it will get into utrace_get_signal()."  That much is true, but
nothing inside utrace_get_signal() actually synchronizes with this to make
that matter.

All this check does is try to optimize the TASK_STOPPED case not to take
the siglock.  That doesn't seem worth much, so we can just drop it.

> Then we re-do this (well, almost) check under ->siglock,
> 
>       } else if (task_is_stopped(target)) {
>               if (!(target->utrace_flags & UTRACE_EVENT(JCTL)))
>                       utrace->stopped = stopped = true;
>       }
> 
> But this is not nice. Let's suppose the task is already stopped, we do
> UTRACE_ATTACH + utrace_set_events(JCTL).

This is exactly why utrace_set_events() sets ->stopped preemptively for
that case.

> Now, utrace_control(UTRACE_STOP) can do nothing until SIGCONT. We don't
> even set ->report. Yes, we can't set ->stopped if JCTL, we can race with
> utrace_report_jctl() which does REPORT().

Setting JCTL while in TASK_STOPPED made it set ->stopped, so
utrace_control() succeeds without calling utrace_do_stop().

> BTW, afaics utrace_report_jctl() has another bug,
> 
>       REPORT(task, utrace, &report, UTRACE_EVENT(JCTL),
>              report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what);
> 
> I think it should do
> 
>       REPORT(task, utrace, &report, UTRACE_EVENT(JCTL),
>              report_jctl, what, notify);
> 
> instead.

There is a bug, but your fix changes a key API choice.
I've put in this fix:

-              report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what);
+              report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED,
+              notify ? what : 0);

There are two things a tracer might be tracking: state or events.
The "state" is whether the thread is in job control stop or is running.
The "events" are the SIGCHLD notifications that the thread tries to post to
its parent.

The @type argument shows the state we will be in after the callback.
If the state changes, there will be another callback.  That's what a
state-tracking tracer needs, e.g. to keep a little light on the screen red
while the thread is stopped and green while it's running.

The @notify argument shows what SIGCHLD the parent sees (if it were
dequeuing all possible SIGCHLD postings as quickly as they come).  That's
what an event-tracking tracer needs, e.g. to match up with what SIGCHLDs
are expected in the parent.

Your change to @type would break state-trackers in the case where
tracehook_notify_jctl() is called from get_signal_to_deliver() with
CLD_STOPPED.

> With the first patch, we call utrace_report_jctl() before we actually
> stop. do_signal_stop() can fail then, but I think this is OK, we can
> pretend that SIGCONT/SIGKILL happened after we stopped. It is not complete,
> and with this patch we always call ->report_jctl with notify == 0. Just for
> discussion.

I think I sort of understand the intent of your patch.  If we change the
calling convention for tracehook_notify_jctl, I think we want to preserve
the aspect that the hook decides about sending the notification.  That's
how the ptrace quirks can be reimplemented differently later without
changing the tracehook layer again.  Also, we certainly don't want one
tracehook call with two different locking conditions.

It seems right in principle to do the reporting before we change ->state,
given that we have to allow for it changing during the callbacks.  And
indeed, that avoids the JCTL special case mess entirely.


Thanks,
Roland

Reply via email to