> Yes, except this doesn't really work. We have a lot of races, afaics,
> even without ptrace. The problem is, once we drop ->siglock, we can't
> trust SIGNAL_STOP_DEQUEUED. And siglock is dropped by dequeue_signal(),
> it is dropped by get_signal_to_deliver(), by ptrace_signal/utrace too.

If you think there are real problems unrelated to ptrace, we should be
discussing that separately on LKML.

Firstly, an overriding caveat.  This logic was originally added to address
a race between stop signals and SIGKILL.  In that scenario, a SIGKILL or
other group-exit came along while we were in the is_orphaned_pgrp() window
after dequeuing SIGTTIN or suchlike (I think it was a single-thread case).
So after retaking the siglock, we blocked in TASK_STOPPED while SIGKILL was
pending.  A repeated SIGKILL would find it was already pending and not do
any signal_wake_up(), so it became unkillable.

Myself, I do care that we get all the cases right by the anal POSIX rules.
But I'm pretty sure that upstream at large will only ever actually care
about the SIGKILL case.  So that's the caveat about how much to worry.

Encountering the aforementioned unkillable-stopped-process case and working
out the SIGNAL_STOP_DEQUEUED solution all predates numerous cleanups you
have done.  Now do_signal_stop() checks for signal_group_exit() too--and
you've cleaned up all sending paths to use complete_signal() so
SIGNAL_GROUP_EXIT is reliably set in all situations--so now that would
cover the original case even without any SIGNAL_STOP_DEQUEUED logic.

> Example. get_signal_to_deliver() dequeues, say, SIGTTIN, drops ->siglock
> for is_current_pgrp_orphaned().
> 
> SIGCONT comes, clears SIGNAL_STOP_DEQUEUED - we shouldn't stop.

Right.  SIGCONT also clears all pending stop signals, in the same atomic
section with clearing SIGNAL_STOP_DEQUEUED and queuing SIGCONT.

> Another thread deques SIGTTOU but this signal has a handler, so we should
> not stop. However, SIGTTOU restores SIGNAL_STOP_DEQUEUED. This means the
> first thread will initiate the group-stop later.

So this is after SIGCONT's atomic section.  This clears SIGCONT if it's
still pending, atomic with posting SIGTTOU.  So either SIGCONT"s handler
(i.e. userland knowing for sure there was any SIGCONT) has begun to run
already, or it won't run at all.

I've been trying to think of a way to define it away is OK, but I can't
quite get there.  So I think we do indeed have a problem there.  I wonder
if we can hash that out independent of ptrace stuff.

> Say, ptrace(DETACH/CONT, SIGSTOP). This should work, this means
> SIGNAL_STOP_DEQUEUED should be set even before the tracee calls
> do_signal_stop(). But otoh it doesn't look right to set this flag
> each time the tracee sees a stop signal from the debugger (especially
> on detach), ->real_parent should not see multiple notifications.

I don't think an extra SIGCHLD/wakeup here is going to be considered a
problem, in the grand scheme of things.  I can't really see any plausible
way we'd (want to) preserve whether the real_parent already thought it was
stopped or not.

> The current utrace-ptrace behaviour is close enough to upstream behaviour.
> I hope. I am going to restore the "detach always wakeup" logic to be
> compatible. Hopefully in this case utrace-ptrace will be equally buggy.

A laudable goal.

> I just can't concentrate on these problems until I finish utrace-ptrace.
> Because every week I should explain 1-2 times why it is still not ready
> for submission and when I am going to send it ;)

If we can finish and submit utrace-ptrace without fixing them, that is our
first priority.  We can make Jan happy with the detach semantics later.


Thanks,
Roland

Reply via email to