Thanks very much for the report, Alexey.  I've trimmed the CC since I don't
think the wider kernel community is interested in the internal bugs of a
patch that for now is not even submitted for inclusion (and I dislike
cross-posting as a rule).

> utrace patch against 2.6.24-rc8 kernel reasonably quickly oopses in the
> following way:

These cases seem to vary widely among configs and machines.  The last time
I tried, I did not have much luck getting any of the crashes to happen
often enough to be able to do much investigation.

> Call Trace:
>  [<c1049290>] get_utrace_lock_attached+0x0/0xb0
>  [<c104a41b>] utrace_detach+0x1b/0xc0
>  [<c104d7c2>] ptrace_exit+0xb2/0x1b0
>  [<c104d762>] ptrace_exit+0x52/0x1b0
>  [<c102107a>] do_exit+0x8a/0x760
[...]
> What happens is dangling tsk passed into get_utrace_lock_attached() --
> "8b 9e a4 04 00 00" is "mov 0x4a4(%esi),%ebx" which corresponds to
> ->utrace offset inside task_struct here. Sorry, haven't looked further.

Here's the theory on what's supposed to prevent this one.
(Maybe you can see the holes.)  ptrace_exit is in rcu_read_lock, doing:

                list_for_each_safe_rcu(pos, n, &tsk->ptracees) {
                        state = list_entry(pos, struct ptrace_state, entry);
                        error = utrace_detach(state->task, state->engine);

RCU should be keeping all task_struct's alive that were alive when we
started looking at the list.

The other side of the race is release_task, which depending on the test
case is called by the task itself, or by another one calling wait.
release_task should lead to ptrace_report_reap.  If this is true, both
state and state->task should be kept valid by RCU.

Actually it might instead get to ptrace_report_death and detach itself, or
get ptrace_detach'd via detach_zombie, in which case 'state' is already
stale.  If this is true, then perhaps state->task was already passed to
release_task before the rcu_read_lock in ptrace_exit.  

Do you think that logic is sound?  If this latter scenario is what you are
hitting, then perhaps it would be fixed by the patch below.  But I'm not
very confident I really know what's going on in your crash.  


Thanks,
Roland


diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index b22c1d0..0000000 100644  
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -157,6 +157,14 @@ ptrace_done(struct ptrace_state *state)
        CHECK_DEAD(state);
        BUG_ON(state->rcu.func);
        BUG_ON(state->rcu.next);
+       /*
+        * We clear @task here, while we are sure that the task_struct is
+        * still live, because our caller has to permit its release.
+        * By RCU rules, this means that inside rcu_read_lock(),
+        * rcu_dereference(state->task) will always produce either
+        * a pointer that is being kept alive by RCU, or NULL.
+        */
+       rcu_assign_pointer(state->task, NULL);
        call_rcu(&state->rcu, ptrace_state_free);
 }
 
@@ -503,6 +511,7 @@ ptrace_exit(struct task_struct *tsk)
 
        do {
                struct ptrace_state *state;
+               struct task_struct *p;
                int error;
 
                START_CHECK;
@@ -512,7 +521,17 @@ ptrace_exit(struct task_struct *tsk)
                restart = 0;
                list_for_each_safe_rcu(pos, n, &tsk->ptracees) {
                        state = list_entry(pos, struct ptrace_state, entry);
-                       error = utrace_detach(state->task, state->engine);
+                       /*
+                        * Here rcu_read_lock() keeps live any task_struct
+                        * that state->task still points to.  If state->task
+                        * was cleared already, then state itself is on the
+                        * way to be freed by RCU and we are just seeing a
+                        * stale list element here.
+                        */
+                       p = rcu_dereference(state->task);
+                       if (unlikely(p == NULL))
+                               continue;
+                       error = utrace_detach(p, state->engine);
                        BUG_ON(state->parent != tsk);
                        if (likely(error == 0)) {
                                ptrace_state_unlink(state);
@@ -525,7 +544,6 @@ ptrace_exit(struct task_struct *tsk)
                                 * Since wait_task_inactive might yield,
                                 * we must go out of rcu_read_lock and restart.
                                 */
-                               struct task_struct *p = state->task;
                                get_task_struct(p);
                                rcu_read_unlock();
                                wait_task_inactive(p);
@@ -1237,7 +1255,9 @@ ptrace_do_wait(struct task_struct *tsk,
        rcu_read_lock();
        ns = current->nsproxy->pid_ns;
        list_for_each_entry_rcu(state, &tsk->ptracees, entry) {
-               p = state->task;
+               p = rcu_dereference(state->task);
+               if (unlikely(p == NULL))
+                       continue;
 
                if (pid > 0) {
                        if (task_pid_nr_ns(p, ns) != pid)

Reply via email to