> IOW, 2 threads T1 and T2. T2 forks the child C. T1 ptraces C. C dies
> and becomes EXIT_ZOMBIE. It sends the notification to thread-group.
> 
> Then, any thread does do_wait(). But since ptrace_reparented() = T
> we don't release C but send the notification again. This doesn't
> look right.

Technically, I think this really is "right".  It just seems screwy because,
well, the whole ptrace+wait interface is indeed screwy.  

T1 is the ptracer, and is not the natural parent.  Consider that T1 runs a
piece of code (library, isolated chunk in a giant complex program) that got
just got asked to trace C.  It doesn't know anything about C, it just knows
that PTRACE_ATTACH worked on it.  So, it expects the usual behavior when it
does waitpid(C) and gets !WIFSTOPPED: automatic detach, notification of the
real parent, and the real parent's waits work.  

Imagine T2 runs another piece of code that forks and waits for that child,
and doesn't know anything else, e.g. it called system().  That code is
isolated in the function, and all it expects of the rest of the (unknown)
code in the process is that any wait calls are waitpid() selecting only a
known child (or are in other threads using __WNOTHREAD, etc.), so nobody
will steal its child.

These two isolated chunks of code have limiting (and perhaps short-sighted)
assumptions.  But things work out just right for them.  (Naturally they
have problems if both calls are in the same thread leaving the child alive
in between, but imagine some current application that never does it that way.)

Now C dies and the sequence is:

        C dies -> wake_up_parent
        T1 wakes up, enters wait loop
        T2 wakes up, enters wait loop
        T1 sees C in wait_task_zombie() -> will report, about to untrace it
        T2 sees C in wait_task_zombie() -> task_ptrace(C) still true, skip it
        T1 untraces C
        T2 blocks again til 2nd wake_up_parent

If we were to omit the second do_notify_parent() as you suggest, then T2
stays blocked forever instead of reaping C.

If we were to change ptrace_reparented() as you contemplate, then even
after some other wakeup, T2 would get -ECHILD.  

Either way, the system call ABI compatibility is broken.
It's just not an option, merits of interface choices aside.

Note for this case it now works right when both use just __WNOTHREAD, which
a caller "trying to be smart about it" might reasonably do.  T1 is seeing C
on its ->ptraced, and T2 is seeing (skipping) C on its ->children list.
When everybody uses __WNOTHREAD, I bet they'd think that ptrace_reparented()
losing that distinction is pretty counterintuitive.


Thanks,
Roland

Reply via email to