On 10/11, Roland McGrath wrote: > > > Yes. In particular, ptrace(PTRACE_DETACH, SIGKILL) should cancel > > SIGNAL_STOP_STOPPED, yes? > > Yes.
OK. > SIGNAL_STOP_DEQUEUED exists for one purpose. It's to ensure that SIGCONT > and SIGKILL can clear it to make complete their required effect of clearing > all pending stop signals. (It fills the hole when another thread has > dequeued a stop signal and then dropped the siglock to make its call to > is_current_pgrp_orphaned()--so that half-delivered signal is still > considered "pending" and thus must be cancelled by SIGCONT or SIGKILL.) Yes, except this doesn't really work. We have a lot of races, afaics, even without ptrace. The problem is, once we drop ->siglock, we can't trust SIGNAL_STOP_DEQUEUED. And siglock is dropped by dequeue_signal(), it is dropped by get_signal_to_deliver(), by ptrace_signal/utrace too. Example. get_signal_to_deliver() dequeues, say, SIGTTIN, drops ->siglock for is_current_pgrp_orphaned(). SIGCONT comes, clears SIGNAL_STOP_DEQUEUED - we shouldn't stop. Another thread deques SIGTTOU but this signal has a handler, so we should not stop. However, SIGTTOU restores SIGNAL_STOP_DEQUEUED. This means the first thread will initiate the group-stop later. > Another way to put it is to say that the "exists for one purpose" > statement above implies that only an actual SIGCONT or SIGKILL should > ever clear SIGNAL_STOP_DEQUEUED. Yes, agreed. But I really need to re-read this code. > The remaining two places are the ->flags = SIGNAL_STOP_STOPPED cases in > do_signal_stop and exit_signals. Since SIGNAL_STOP_DEQUEUED must always > have been set before if you can get to those situations, it is harmless > to use "->flags = SIGNAL_STOP_STOPPED | SIGNAL_STOP_DEQUEUED" instead of: > > sig->flags &= SIGNAL_STOP_DEQUEUED; > sig->flags |= SIGNAL_STOP_STOPPED; > > or anything like that. It's probably cleanest to consolidate those two > cases to call a single subroutine that does the tracehook_notify_jctl > logic, unlock and do_notify_parent_cldstop. It can take a caller flag > or just check PF_EXITING to omit the ->exit_code + ->state change of the > do_signal_stop version of the code. That one subroutine can have a > clear comment about the nonobvious flag usage. Hmm... not sure, but I need to think more. > > But please remember, the patch above is not complete of course and currently > > I do not see the good solution. > > What's incomplete aside from handling the exit_signals case the same way? Say, ptrace(DETACH/CONT, SIGSTOP). This should work, this means SIGNAL_STOP_DEQUEUED should be set even before the tracee calls do_signal_stop(). But otoh it doesn't look right to set this flag each time the tracee sees a stop signal from the debugger (especially on detach), ->real_parent should not see multiple notifications. > > I am starting to think we should forget > > about these bugs, merge utrace-ptrace, and then try to fix them. > > If we can have utrace-ptrace code whose corner behavior matches the old > code and is itself clean, then I don't care about the order of the > changes going in. But it's not really clear to me that we can even > describe the old behavior in terms clean enough to make an exact > work-alike implementation that could possibly be clean. Yep. The current utrace-ptrace behaviour is close enough to upstream behaviour. I hope. I am going to restore the "detach always wakeup" logic to be compatible. Hopefully in this case utrace-ptrace will be equally buggy. I just can't concentrate on these problems until I finish utrace-ptrace. Because every week I should explain 1-2 times why it is still not ready for submission and when I am going to send it ;) Oleg.