> What I meant, we can rely on the fact that any wakeup (try_to_wakeup()
> actually) must be a barrier wrt schedule(). 

Ok.

> But what if the tracee is woken by CONT/KILL? Should the tracee
> see the last change in ->stopped? The signal can race with
> utrace_wakeup() or utrace_do_stop(), [...]

It can't actually race with utrace_do_stop, because that holds siglock
while touching ->stopped.  Any signal case should already exclude
utrace_do_stop that way.  So it's only utrace_wakeup, which clears
->stopped before taking siglock--and that's the harmless direction to race.

> But I think all we need is to be sure that:
> 
>       - if ->stopped == T, we have a pending SIGKILL
> 
>         If we were woken by SIGCONT which comes right after
>         utrace_wakeup() does s/TRACED/STOPPED/, I think we must
>         see the result of "->stopped = 0" in utrace_wakeup().
>         Both utrace_wakeup() and signal_wake_up() take ->siglock,
>         the memory change in ->stopped must be visible to
>         utrace_wakeup() and the wakee.

Ok.

>       - If we were killed, we must see ->stopped == T _unless_
>         utrace_wakeup() is in progress or we are already woken
>         from utrace pov.

Right.  In those cases it doesn't matter whether we think we were killed or
not.  It can be "before" the wakeup or "after" the wakeup.

>         Now we should worry about SIGKILL racing with utrace_do_stop().
>         Again, both take ->siglock, etc.
> 
> But in short, I can't understand how utrace->lock can really help,
> given that a signal changes task's state without this lock. It can
> only help to avoid the races with utrace playing with ->stopped in
> parallel, but I see nothing dangerous here.

Right, I just meant about keeping ->stopped "correct" in the signal case.
i.e., in case of signal wakeup, it could never see an old ->stopped value.

> > Shouldn't one always hold siglock to use task->pending at all?
> 
> Why? It only checks the bit. Note that fatal_signal_pending() is
> used lockless.

Ok.

> But this discussion makes me wonder, what report->killed and
> utrace_stop's returned value buy us? Can't we kill report->killed
> and make utrace_stop() return void?

I think at some point earlier I was not confident that a local pending
SIGKILL would always be there.  This was before we had fatal_signal_pending
et al.  It might not have been a valid worry even then.

> With or without this patch. Suppose that (say) utrace_report_syscall_entry()
> does utrace_stop() and sleeps in TASK_TRACED. Another thread starts the
> group stop and sets SIGNAL_STOP_STOPPED. utrace_wakeup() clears ->stopped,
> notices SIGNAL_STOP_STOPPED and doesn't wake up.
> 
> Now, we can sleep a long time. Then SIGKILL kills us, but finish_utrace_stop()
> returns false.

Yes, that's a good point.

> Afaics, only utrace_get_signal() uses report.kill, and only
> utrace_report_syscall_entry() uses the result of utrace_stop(). Both
> can just check fatal_signal_pending().

Ok.

> > You mean the tracehook_notify_jctl(why, CLD_CONTINUED) case, right?
> 
> Not sure. utrace_get_signal() can call ->report_signal() before it
> takes utrace->lock.

Right.  It couldn't before af3eb44.

> Heh. Actually, I was wrong. utrace_report_jctl() is not possible if the
> tracee is killed. complete_signal(SIGKILL) sets ->flags = SIGNAL_GROUP_EXIT.

Ah, good.

> > OTOH we can just keep utrace_finish_jctl() and have it and
> > finish_utrace_stop() just always take the utrace lock purely
> > to synchronize (and then drop it without changing anything).
> 
> or we can just do spin_unlock_wait().

I guess that has the same effect as lock+unlock, sure.

> Anyway, I believe that this change, even if good, is low priority. I mean,
> it doesn't immediately makes possible other improvements, afaics. But this
> all is up to you.

No, but it is up to you! ;-)  The point of the exercise is to get utrace
merged upstream.  We will of course keep refining it in the future.  My
attention to the idea of dropping ->stopped now was purely on the thought
that it might make the code easier to understand/review/believe correct 
so it could help with upstream reviewers liking it for merge.  So do it if
it helps or do the opposite if that helps, or play it by ear as feels right.


Thanks,
Roland

Reply via email to