The utrace_*_examine calls have been neither used nor examined much at all
(no pun intended).  So both the API and the implementation for these can
use a fresh look and reconsideration.  Also, this is somewhat intimately
tied up with the broader synchronization picture.  So it might make most
sense not to spend much effort hashing out microdetails of *_examine until
we are doing it in the context of having hashed more out about the general
subject of synchronization.

> I fail to understand rcu_read_lock() + get_task_struct() in
> utrace_prepare_examine(). This looks as if the caller does not need to
> make sure task_struct can't go away.

That's kind of the idea.  (See below.)

> But, unless the caller does get_task_struct() itself (like ptrace does),
> utrace_prepare_examine() can race with utrace_reap() (if the tracee is
> killed) which detaches the engine but doesn't clear engine->flags ?

Yeah, it might just need this:

diff --git a/kernel/utrace.c b/kernel/utrace.c
index 74b5fc5..60af805 100644
--- a/kernel/utrace.c
+++ b/kernel/utrace.c
@@ -549,6 +549,7 @@ restart:
        list_for_each_entry_safe(engine, next, &utrace->attached, entry) {
                ops = engine->ops;
                engine->ops = NULL;
+               engine->flags = 0;
                list_move(&engine->entry, &detached);
 
                /*
@@ -580,6 +581,12 @@ restart:
        spin_unlock(&utrace->lock);
 
        put_detached_list(&detached);
+
+       /*
+        * Make sure all engine->flags and engine->ops updates are done
+        * before the current task_struct might really die.
+        */
+       smp_wmb();
 }
 
 /*
@@ -2267,7 +2274,7 @@ int utrace_prepare_examine(struct task_struct *target,
        rcu_read_lock();
        if (unlikely(!engine_wants_stop(engine)))
                ret = -EINVAL;
-       else if (unlikely(target->exit_state))
+       else if (unlikely(smp_rmb(), target->exit_state))
                ret = -ESRCH;
        else {
                exam->state = target->state;
@@ -2320,7 +2327,7 @@ int utrace_finish_examine(struct task_struct *target,
        rcu_read_lock();
        if (unlikely(!engine_wants_stop(engine)))
                ret = -EINVAL;
-       else if (unlikely(target->state != exam->state))
+       else if (unlikely(smp_rmb(), target->state != exam->state))
                ret = -EAGAIN;
        else
                get_task_struct(target);

(but with a bit better commenting), or maybe that's not quite it either.

Let me explain the general notion.  It's always been a little iffy, and it
definitely deserves an airing now in your fresh review of all things utrace.

The root of the thing is that modules can't use get_task_struct().  More
precisely, __put_task_struct is not exported (though it was once).  But,
the main use of utrace is from modules (probably everything but ptrace).
One could argue upstream to export __put_task_struct again, but I wouldn't
expect that whole discussion to be much fun.  Even if not for the export
issue, one could possibly argue that the utrace API does a service if it
makes it difficult or impossible for a buggy utrace-using module to cause
task_struct leaks.  (That argument is a bit thin, I grant.)

So, I've tried to address this in two different ways in the utrace API.
One is the utrace_*_pid interfaces.  put_pid() is exported, so modules can
use get_pid/put_pid and only utrace_*_pid calls and never worry about task
death races.  (That is, all utrace_*_pid calls will safely return -ESRCH if
the task disappears out from under the struct pid.)  

That approach was actually a later afterthought when I noticed that put_pid
was exported.  The other approach taken is still there too.  That is, magic
guarantees about utrace_* calls with valid engine pointers.  The utrace
calls take both a task_struct pointer and a utrace_engine pointer.  If your
engine pointer is valid, your task_struct pointer doesn't necessarily have
to be.  That is, they will safely return -ESRCH (or maybe some other
particular error cases) if the task_struct disappears out from under the
engine (including your code just asynchronously detaching the engine).

So from the API perspective, modules have to maintain utrace_engine ref
counts, and they can leak utrace_engine's if buggy.  But if they get that
right, just storing a task_struct pointer solely to pass it to utrace_*
calls along with your engine pointer is safe without any other bookkeeping.

The other calls use get_utrace_lock to implement this guarantee.  It checks
that your engine is attached before it looks at the task_struct, and this
is (supposedly) synchronized with detach and reap safely.  (You should be
checking on that supposition, Oleg. ;-)

Now we come to utrace_*_examine.  These don't use get_utrace_lock, mostly
just because I didn't think they needed that lock for any other reason.
But the intent for the API was the same, and perhaps the patch above (or
something along those lines) makes it an adequate implementation of that.

But, this API is a bit dubious even if implemented correctly.  (Here I
mean just utrace_*_examine, though I'd see the point if someone wanted
to characterize this whole task_struct pointer handling plan that way.)
These API guarantees make each utrace_* call safe and atomic by itself.
This seems pretty solidly useful for the others, which each complete a
self-contained action on the task/engine state.  Conversely,
utrace_prepare_examine and utrace_finish_examine act as a pair with the
point being to use the task_struct pointer for things in between.

For a direct poke at task->thread.something you can make it work.

        rcu_read_lock();
        ret = utrace_prepare_examine(task, engine, &exam);
        if (ret) goto hell;
        fiddle = task->faddle;
        rcu_read_unlock();
        ret = utrace_finish_examine(task, engine, &exam);
        if (ret) goto hell;

Even though utrace_prepare_examine (wait_task_inactive) will block and
that lets RCU stuff happen, when you've come back again you will be in
rcu_read_lock across the put_task_struct it does.  So task is live until
rcu_read_unlock().  (I have no idea if intentionally allowing a block
inside rcu_read_lock freaks out lockdep or something.)

But the case expected most is user_regset calls.  You can't expect to
call them safely without breaking rcu_read_lock.  They are allowed to
block, some call access_process_vm, etc.  So how do you do that?

This suggests perhaps utrace_prepare_examine ought to keep a task ref to
be released by utrace_finish_examine.  But then it's a back-door way for
modules to use get/put_task_struct and leak task_structs.  Is that OK?

Otherwise, what else do we do instead?  Do we have to provide utrace
wrappers specifically for user_regset calls?  It's not so bad that
e.g. task_current_syscall() rolls it all in, but with user_regset there
are three flavors of call, and you might really want to do several calls
on one or more user_regsets, with some quick logic in between, etc.  It
could suck to do the whole wait_task_inactive cycle 2n times for n
micro-level regset operations instead of just 2 times for the whole shebang.
That's why I had the idea of the whole prepare/finish API model to begin with.

So I think that's more or less the whole brain dump on utrace_*_examine
and related subjects, except for any ideas about tying in differently to
other kinds of synchronization.


Thanks,
Roland

Reply via email to