On 10/11, Roland McGrath wrote:
>
> > Yes. In particular, ptrace(PTRACE_DETACH, SIGKILL) should cancel
> > SIGNAL_STOP_STOPPED, yes?
>
> Yes.

OK.

> SIGNAL_STOP_DEQUEUED exists for one purpose.  It's to ensure that SIGCONT
> and SIGKILL can clear it to make complete their required effect of clearing
> all pending stop signals.  (It fills the hole when another thread has
> dequeued a stop signal and then dropped the siglock to make its call to
> is_current_pgrp_orphaned()--so that half-delivered signal is still
> considered "pending" and thus must be cancelled by SIGCONT or SIGKILL.)

Yes, except this doesn't really work. We have a lot of races, afaics,
even without ptrace. The problem is, once we drop ->siglock, we can't
trust SIGNAL_STOP_DEQUEUED. And siglock is dropped by dequeue_signal(),
it is dropped by get_signal_to_deliver(), by ptrace_signal/utrace too.

Example. get_signal_to_deliver() dequeues, say, SIGTTIN, drops ->siglock
for is_current_pgrp_orphaned().

SIGCONT comes, clears SIGNAL_STOP_DEQUEUED - we shouldn't stop.

Another thread deques SIGTTOU but this signal has a handler, so we should
not stop. However, SIGTTOU restores SIGNAL_STOP_DEQUEUED. This means the
first thread will initiate the group-stop later.

> Another way to put it is to say that the "exists for one purpose"
> statement above implies that only an actual SIGCONT or SIGKILL should
> ever clear SIGNAL_STOP_DEQUEUED.

Yes, agreed. But I really need to re-read this code.

> The remaining two places are the ->flags = SIGNAL_STOP_STOPPED cases in
> do_signal_stop and exit_signals.  Since SIGNAL_STOP_DEQUEUED must always
> have been set before if you can get to those situations, it is harmless
> to use "->flags = SIGNAL_STOP_STOPPED | SIGNAL_STOP_DEQUEUED" instead of:
>
>       sig->flags &= SIGNAL_STOP_DEQUEUED;
>       sig->flags |= SIGNAL_STOP_STOPPED;
>
> or anything like that.  It's probably cleanest to consolidate those two
> cases to call a single subroutine that does the tracehook_notify_jctl
> logic, unlock and do_notify_parent_cldstop.  It can take a caller flag
> or just check PF_EXITING to omit the ->exit_code + ->state change of the
> do_signal_stop version of the code.  That one subroutine can have a
> clear comment about the nonobvious flag usage.

Hmm... not sure, but I need to think more.

> > But please remember, the patch above is not complete of course and currently
> > I do not see the good solution.
>
> What's incomplete aside from handling the exit_signals case the same way?

Say, ptrace(DETACH/CONT, SIGSTOP). This should work, this means
SIGNAL_STOP_DEQUEUED should be set even before the tracee calls
do_signal_stop(). But otoh it doesn't look right to set this flag
each time the tracee sees a stop signal from the debugger (especially
on detach), ->real_parent should not see multiple notifications.

> > I am starting to think we should forget
> > about these bugs, merge utrace-ptrace, and then try to fix them.
>
> If we can have utrace-ptrace code whose corner behavior matches the old
> code and is itself clean, then I don't care about the order of the
> changes going in.  But it's not really clear to me that we can even
> describe the old behavior in terms clean enough to make an exact
> work-alike implementation that could possibly be clean.

Yep.

The current utrace-ptrace behaviour is close enough to upstream behaviour.
I hope. I am going to restore the "detach always wakeup" logic to be
compatible. Hopefully in this case utrace-ptrace will be equally buggy.

I just can't concentrate on these problems until I finish utrace-ptrace.
Because every week I should explain 1-2 times why it is still not ready
for submission and when I am going to send it ;)

Oleg.

Reply via email to