Thanks very much for the report, Alexey. I've trimmed the CC since I don't think the wider kernel community is interested in the internal bugs of a patch that for now is not even submitted for inclusion (and I dislike cross-posting as a rule).
> utrace patch against 2.6.24-rc8 kernel reasonably quickly oopses in the > following way: These cases seem to vary widely among configs and machines. The last time I tried, I did not have much luck getting any of the crashes to happen often enough to be able to do much investigation. > Call Trace: > [<c1049290>] get_utrace_lock_attached+0x0/0xb0 > [<c104a41b>] utrace_detach+0x1b/0xc0 > [<c104d7c2>] ptrace_exit+0xb2/0x1b0 > [<c104d762>] ptrace_exit+0x52/0x1b0 > [<c102107a>] do_exit+0x8a/0x760 [...] > What happens is dangling tsk passed into get_utrace_lock_attached() -- > "8b 9e a4 04 00 00" is "mov 0x4a4(%esi),%ebx" which corresponds to > ->utrace offset inside task_struct here. Sorry, haven't looked further. Here's the theory on what's supposed to prevent this one. (Maybe you can see the holes.) ptrace_exit is in rcu_read_lock, doing: list_for_each_safe_rcu(pos, n, &tsk->ptracees) { state = list_entry(pos, struct ptrace_state, entry); error = utrace_detach(state->task, state->engine); RCU should be keeping all task_struct's alive that were alive when we started looking at the list. The other side of the race is release_task, which depending on the test case is called by the task itself, or by another one calling wait. release_task should lead to ptrace_report_reap. If this is true, both state and state->task should be kept valid by RCU. Actually it might instead get to ptrace_report_death and detach itself, or get ptrace_detach'd via detach_zombie, in which case 'state' is already stale. If this is true, then perhaps state->task was already passed to release_task before the rcu_read_lock in ptrace_exit. Do you think that logic is sound? If this latter scenario is what you are hitting, then perhaps it would be fixed by the patch below. But I'm not very confident I really know what's going on in your crash. Thanks, Roland diff --git a/kernel/ptrace.c b/kernel/ptrace.c index b22c1d0..0000000 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -157,6 +157,14 @@ ptrace_done(struct ptrace_state *state) CHECK_DEAD(state); BUG_ON(state->rcu.func); BUG_ON(state->rcu.next); + /* + * We clear @task here, while we are sure that the task_struct is + * still live, because our caller has to permit its release. + * By RCU rules, this means that inside rcu_read_lock(), + * rcu_dereference(state->task) will always produce either + * a pointer that is being kept alive by RCU, or NULL. + */ + rcu_assign_pointer(state->task, NULL); call_rcu(&state->rcu, ptrace_state_free); } @@ -503,6 +511,7 @@ ptrace_exit(struct task_struct *tsk) do { struct ptrace_state *state; + struct task_struct *p; int error; START_CHECK; @@ -512,7 +521,17 @@ ptrace_exit(struct task_struct *tsk) restart = 0; list_for_each_safe_rcu(pos, n, &tsk->ptracees) { state = list_entry(pos, struct ptrace_state, entry); - error = utrace_detach(state->task, state->engine); + /* + * Here rcu_read_lock() keeps live any task_struct + * that state->task still points to. If state->task + * was cleared already, then state itself is on the + * way to be freed by RCU and we are just seeing a + * stale list element here. + */ + p = rcu_dereference(state->task); + if (unlikely(p == NULL)) + continue; + error = utrace_detach(p, state->engine); BUG_ON(state->parent != tsk); if (likely(error == 0)) { ptrace_state_unlink(state); @@ -525,7 +544,6 @@ ptrace_exit(struct task_struct *tsk) * Since wait_task_inactive might yield, * we must go out of rcu_read_lock and restart. */ - struct task_struct *p = state->task; get_task_struct(p); rcu_read_unlock(); wait_task_inactive(p); @@ -1237,7 +1255,9 @@ ptrace_do_wait(struct task_struct *tsk, rcu_read_lock(); ns = current->nsproxy->pid_ns; list_for_each_entry_rcu(state, &tsk->ptracees, entry) { - p = state->task; + p = rcu_dereference(state->task); + if (unlikely(p == NULL)) + continue; if (pid > 0) { if (task_pid_nr_ns(p, ns) != pid)