Re: attach-wait-on-stopped vs detach-stopped

Jan Kratochvil Fri, 08 Aug 2008 15:10:40 -0700

Hi Roland,

thanks for your detailed explanation making the complex problem looking easy.

On Fri, 08 Aug 2008 07:46:35 +0200, Roland McGrath wrote:
> In the latest upstream kernels, detach-stopped is the only ptrace-tests
> case failing.  A fix I tried for that worked, but made attach-wait-on-stopped
> start failing instead.
> 
> Can you tell me if you think the expectation in attach-wait-on-stopped 
> really seems correct?  It seems to be contrary to what detach-stopped wants.
> 
> In attach-wait-on-stopped, this happens:
> 
>       untraced child stops with normal SIGSTOP
>       parent does not wait, stopped state still "to be waited for"
>       parent does PTRACE_ATTACH
>       -> child still in job stop,
>          now has pending SIGSTOP
>       parent does wait, sees it stopped with SIGSTOP (the first one)
>       parent does PEEKUSR, GETREGS (should make no difference)
>       parent does PTRACE_DETACH
> *     -> child has never left job stop,
>          is still in job stop,
>          stays in job stop after detach, does not wake up
>       parent does PTRACE_ATTACH
>       -> child still in job stop, but has been waited for
>          still pending SIGSTOP (third one came but second one still waiting)
>       parent does wait, blocks since child is waited-for but still stopped
> 
> What happened before my fix was that PTRACE_DETACH unconditionally woke the
> thread up from whatever state it was in.  So here,

Just to comment your *here* means the *-marked line.

> it woke up, saw the old pending SIGSTOP, and stopped again (ptrace
> stop)--now with a fresh "still to be waited for" stopped status.

My explanation:
This SIGSTOP you describe was generated by PTRACE_ATTACH.  As we are now after
PTRACE_DETACH (with no TracerPid) when this SIGSTOP is delivered we get into
`T (stopped)' (and not `T (tracing stop') state.

> But this wakeup on PTRACE_DETACH was exactly what detach-stopped does not
> want to see.
> 
> attach-wait-on-stopped uses PTRACE_DETACH,0 while detach-stopped uses
> PTRACE_DETACH,SIGSTOP.

With the `attach-wait-on-stopped uses PTRACE_DETACH,0' testcase part I just
tried to pinpoint the utrace<->ptrace difference being considered
a regression.  Upstream GDB did not support attaching-to-stopped processes
before and it still has the detach-as-stopped behavior currently undefined.
=> I am not aware it would cause any real-world problems to FAIL the
second-attach case of attach-wait-on-stopped.

> So both tests can be satisfied if what it means
> is that PTRACE_DETACH always wakes up a thread (even one that has never
> left job control stop), but it should stop again for the new SIGSTOP.
> (The reason it doesn't stop again now is an esoteric internal one.)
> 
> Is that what you think the rule ought to be?

Yes.  OTOH I do not find why your way would cause any real-world troubles if
you find it more systematic.

> The "if in job stop, stay in job stop" rule seems more sensible to me.  That
> would make detach-stopped pass and attach-wait-on-stopped fail.
> 
> As you're aware, the subtle difference between staying stopped and waking
> up followed by an immediate stop is the "freshening" of the wait status and
> wakeup of a parent/tracer's blocked wait calls.

The goal of the GDB attach-detach behavior is to be fully transparent.
Running /usr/bin/gcore (GDB attach+gcore+detach commands) should leave the
process in a perfectly unchanged state.

We have to eat the pending SIGSTOP notification during `attach'.  With the
`PTRACE_ATTACH, tkill(SIGSTOP), PTRACE_CONT(0), waitpid()' trick (recent
upstream or mid-term RH/Fedora) GDB copes even with stopped processes with
alread pre-eaten pending SIGSTOP notification.

I find it GCORE to be more friendly by possibly generating one excessive
SIGSTOP notification than to possibly eat the only one remaining SIGSTOP
notification.  At least there are applications which run external GCORE on its
SIGSTOP-ped sub-processes which may (not confirmed) expect waitpid() to give
them SIGSTOP afterwards as it worked on before (to be specific - RHEL-4, 2.6.9
non-utrace).

I do not know about a raceless way how to find whether the SIGSTOP
notification was already pending before PTRACE_ATTACH (BTW `/proc/PID/status'
content does not change on the pending/eaten notification).  Therefore a wish
for a possibility to PTRACE_DETACH two ways (leaving/not-leaving a pending
notification) is out of question.

Thanks,
Jan

Re: attach-wait-on-stopped vs detach-stopped

Reply via email to