Re: RCU, reference counts

Roland McGrath Tue, 05 Aug 2008 19:20:51 -0700

> Yes!
[...]
> What is the use case for a utrace client to do a utrace_engine_get/put()?
> Wouldn't it be more robust if utrace implicitly handles refcounts as
> you've detailed below?


If the only operations that affect this "count" are implicit, then I assume
you must mean those are attach and detach.  What today's utrace has now is
an "implicit count": it's 1 when attached, and 0 when not.  It's perfectly
robust as described.  Since it's implicit, you have to observe all those
squirrelly rules about synchronizing with UTRACE_DETACH that I've been
talking about.

If there is no utrace_engine_put, then what difference are you saying an
enthusiastic "Yes!" to?  Explain what you have in mind that would be
somehow different from what we already had, but have no utrace_engine_put.

> Will this also take care of the utrace engine vs. task_struct lifetime
> issues that Alexey reported in [...] ?

The messages you cited are not apropos at all.  Those were about (old)
bugs in the utrace implementation, pure and simple.  We're talking here
about what the specified API is, with the intent of providing one that
helps callers of utrace avoid writing bugs in their own code.  Whatever
the specified API is, of course we're only talking about what it means
to use a utrace implementation that works correctly as specified.

For a moment I thought you were referring to the actual subject about
task_struct lifetime that we need to address here.  This is a problem I
had more or less factored into the original interface with its RCU
requirements.  But I'd forgotten about it in doing the engine ref counts.

The issue is keeping your task_struct pointers valid.  The kernel (for
quite a while now) doesn't let modules use put_task_struct, or
tasklist_lock (they are not exported).  So the only way your code in a
module can keep a task_struct alive (except current, of course) is with
rcu_read_lock.  The RCU requirements for keeping engine pointers alive
in the utrace interface were the same RCU things you'd have to do to
keep task_struct pointers alive, so passing both pointers to each call
just needed the same rcu_read_lock treatment.

The risk is holding a dangling task_struct pointer in your kernel
module's code when the task dies and gets reaped.  AFAICT the answer on
this in general upstream now is to use struct pid.  It is a separately
reference-counted object, and put_pid can be used from kernel modules
(whereas put_task_struct can't be).  It doesn't keep a task_struct
pointer alive, but provides a safe handle on one that you can keep a ref
to in a module.  To actually do something with that task_struct, you
still have to use RCU locally:

        rcu_read_lock();
        t = pid_task(pid, PIDTYPE_PID);
        if (t) ... use t->fields ...
        rcu_read_unlock();

but at least you can keep the struct pid * as long as you like
with safe reference counting you're allowed to use in a module.

Using struct pid * in your own data structures also has some arcane
effects.  In particular, in a multithreaded exec, the old struct pid for
the initial thread will have its task_struct pointer replaced with the
pointer for the thread that did the exec (while the old initial thread
that previously had that pid is reaped).  But this seems to be the plan
in the upstream kernel, since they don't want put_task_struct exported.

So, utrace can help with this a little.  Perhaps this will be enough to
keep it from becoming really difficult to write correct modules.

When an engine is attached, the task_struct is by definition still
valid.  You'll recall that all engines are detached from a task when
it's reaped (after their report_reap callback, if any), so this follows.

So what I can do is this: in all the calls that take task, engine
(set_events, control, barrier, and *_examine) it will always be safe to
pass a stale task_struct pointer along with a valid engine pointer
(i.e. you hold an engine ref).  They'll just check if the engine is
already detached before looking the task pointer, so you'll get an
-ESRCH error.  This makes it safe to use:

        ret = utrace_control(pid_task(pid, PIDTYPE_PID), engine, UTRACE_FOO);

and the like without RCU.  It's also true that utrace_set_events and
utrace_control are purely nonblocking calls, so you can call them inside
rcu_read_lock if you are using that to access the task_struct safely
yourself anyway.

utrace_attach itself can block (calling kmalloc), but I can still make
it such that:

        rcu_read_lock();
        engine = utrace_attach(pid_task(pid, PIDTYPE_PID), ...);
        
is safe with respect to the task_struct pointer, though it could block
in between, so you could not be relying on RCU otherwise there.

I think the thing to do is export both utrace_attach_task (today's
utrace_attach), and a utrace_attach_pid that is the same but taking a
struct pid * argument instead of task_struct.  The utrace_attach_task
call is safe (and simplest) to use when it's current or is the new child
in report_clone, or is your child you're somehow sure won't be reaped.
The utrace_attach_pid call would always be safe given the caller holds a
ref on the struct pid it passes.  e.g.:

        pid = find_get_pid(nr);
        engine = utrace_attach_pid(pid, ...);
        ...
        put_pid(pid);

You'd presumably hold the pid ref in your data structures to use e.g. in
the utrace_control example above.  The put_pid call goes in your cleanup
code.


Thanks,
Roland

Re: RCU, reference counts

Reply via email to