The utrace_*_examine calls have been neither used nor examined much at all (no pun intended). So both the API and the implementation for these can use a fresh look and reconsideration. Also, this is somewhat intimately tied up with the broader synchronization picture. So it might make most sense not to spend much effort hashing out microdetails of *_examine until we are doing it in the context of having hashed more out about the general subject of synchronization.
> I fail to understand rcu_read_lock() + get_task_struct() in > utrace_prepare_examine(). This looks as if the caller does not need to > make sure task_struct can't go away. That's kind of the idea. (See below.) > But, unless the caller does get_task_struct() itself (like ptrace does), > utrace_prepare_examine() can race with utrace_reap() (if the tracee is > killed) which detaches the engine but doesn't clear engine->flags ? Yeah, it might just need this: diff --git a/kernel/utrace.c b/kernel/utrace.c index 74b5fc5..60af805 100644 --- a/kernel/utrace.c +++ b/kernel/utrace.c @@ -549,6 +549,7 @@ restart: list_for_each_entry_safe(engine, next, &utrace->attached, entry) { ops = engine->ops; engine->ops = NULL; + engine->flags = 0; list_move(&engine->entry, &detached); /* @@ -580,6 +581,12 @@ restart: spin_unlock(&utrace->lock); put_detached_list(&detached); + + /* + * Make sure all engine->flags and engine->ops updates are done + * before the current task_struct might really die. + */ + smp_wmb(); } /* @@ -2267,7 +2274,7 @@ int utrace_prepare_examine(struct task_struct *target, rcu_read_lock(); if (unlikely(!engine_wants_stop(engine))) ret = -EINVAL; - else if (unlikely(target->exit_state)) + else if (unlikely(smp_rmb(), target->exit_state)) ret = -ESRCH; else { exam->state = target->state; @@ -2320,7 +2327,7 @@ int utrace_finish_examine(struct task_struct *target, rcu_read_lock(); if (unlikely(!engine_wants_stop(engine))) ret = -EINVAL; - else if (unlikely(target->state != exam->state)) + else if (unlikely(smp_rmb(), target->state != exam->state)) ret = -EAGAIN; else get_task_struct(target); (but with a bit better commenting), or maybe that's not quite it either. Let me explain the general notion. It's always been a little iffy, and it definitely deserves an airing now in your fresh review of all things utrace. The root of the thing is that modules can't use get_task_struct(). More precisely, __put_task_struct is not exported (though it was once). But, the main use of utrace is from modules (probably everything but ptrace). One could argue upstream to export __put_task_struct again, but I wouldn't expect that whole discussion to be much fun. Even if not for the export issue, one could possibly argue that the utrace API does a service if it makes it difficult or impossible for a buggy utrace-using module to cause task_struct leaks. (That argument is a bit thin, I grant.) So, I've tried to address this in two different ways in the utrace API. One is the utrace_*_pid interfaces. put_pid() is exported, so modules can use get_pid/put_pid and only utrace_*_pid calls and never worry about task death races. (That is, all utrace_*_pid calls will safely return -ESRCH if the task disappears out from under the struct pid.) That approach was actually a later afterthought when I noticed that put_pid was exported. The other approach taken is still there too. That is, magic guarantees about utrace_* calls with valid engine pointers. The utrace calls take both a task_struct pointer and a utrace_engine pointer. If your engine pointer is valid, your task_struct pointer doesn't necessarily have to be. That is, they will safely return -ESRCH (or maybe some other particular error cases) if the task_struct disappears out from under the engine (including your code just asynchronously detaching the engine). So from the API perspective, modules have to maintain utrace_engine ref counts, and they can leak utrace_engine's if buggy. But if they get that right, just storing a task_struct pointer solely to pass it to utrace_* calls along with your engine pointer is safe without any other bookkeeping. The other calls use get_utrace_lock to implement this guarantee. It checks that your engine is attached before it looks at the task_struct, and this is (supposedly) synchronized with detach and reap safely. (You should be checking on that supposition, Oleg. ;-) Now we come to utrace_*_examine. These don't use get_utrace_lock, mostly just because I didn't think they needed that lock for any other reason. But the intent for the API was the same, and perhaps the patch above (or something along those lines) makes it an adequate implementation of that. But, this API is a bit dubious even if implemented correctly. (Here I mean just utrace_*_examine, though I'd see the point if someone wanted to characterize this whole task_struct pointer handling plan that way.) These API guarantees make each utrace_* call safe and atomic by itself. This seems pretty solidly useful for the others, which each complete a self-contained action on the task/engine state. Conversely, utrace_prepare_examine and utrace_finish_examine act as a pair with the point being to use the task_struct pointer for things in between. For a direct poke at task->thread.something you can make it work. rcu_read_lock(); ret = utrace_prepare_examine(task, engine, &exam); if (ret) goto hell; fiddle = task->faddle; rcu_read_unlock(); ret = utrace_finish_examine(task, engine, &exam); if (ret) goto hell; Even though utrace_prepare_examine (wait_task_inactive) will block and that lets RCU stuff happen, when you've come back again you will be in rcu_read_lock across the put_task_struct it does. So task is live until rcu_read_unlock(). (I have no idea if intentionally allowing a block inside rcu_read_lock freaks out lockdep or something.) But the case expected most is user_regset calls. You can't expect to call them safely without breaking rcu_read_lock. They are allowed to block, some call access_process_vm, etc. So how do you do that? This suggests perhaps utrace_prepare_examine ought to keep a task ref to be released by utrace_finish_examine. But then it's a back-door way for modules to use get/put_task_struct and leak task_structs. Is that OK? Otherwise, what else do we do instead? Do we have to provide utrace wrappers specifically for user_regset calls? It's not so bad that e.g. task_current_syscall() rolls it all in, but with user_regset there are three flavors of call, and you might really want to do several calls on one or more user_regsets, with some quick logic in between, etc. It could suck to do the whole wait_task_inactive cycle 2n times for n micro-level regset operations instead of just 2 times for the whole shebang. That's why I had the idea of the whole prepare/finish API model to begin with. So I think that's more or less the whole brain dump on utrace_*_examine and related subjects, except for any ideas about tying in differently to other kinds of synchronization. Thanks, Roland