> Thinking aloud, utrace_control(UTRACE_STOP) returns -EINPROGRESS for
> threads not yet stopped
> a. possibly still in userspace, yet to pass through a quiesce safe point
> b. blocked in the kernel on a syscall or an exception.

Correct.

> Would task_current_syscall() help here? On a -EINPROGRESS return, one
> can check for a 0 return from task_current_syscall() which'd mean the
> thread is in the kernel. Wouldn't that mean that the thread will
> eventually do a report_quiesce?

You already know it will *eventually* do a report.  UTRACE_STOP having
returned -EINPROGRESS is roughly equivalent to you having used
UTRACE_REPORT instead.  (The difference is that if you have not enabled
report_quiesce or whatever callback that next arises, it will stop and
stay stopped anyway.)

The two issues are that, if it's in the kernel, this might not be very
soon, or it might be right now.  

If it's blocked in a syscall like poll or read, then it could be a very
long time (weeks) before there is a report.  

On the other hand, if it's blocked in the kernel it could be just about to
wake up and hit a callback event.  In fact, it could even be blocked in a
kmalloc call inside your callback.

When you say task_current_syscall, think utrace_prepare_examine.  They use
the same synchronizing parts.  If task_current_syscall were done in utrace
terms, it would be:

        utrace_prepare_examine || abort
        syscall_get_nr
        syscall_get_arguments
        utrace_finish_examine || abort

The only part you're interested in here is what utrace_prepare_examine is
doing.  It checks that the target is blocked (excluding simple preemption).

> I think this can be a good model to use for non-perturbing quiesce for
> cases as breakpoint insertion and removal or any applicaton text
> modification, where one needs to ensure no thread is executing in the
> vicinity of the said modification. So, even on an -EINPROGRESS on a
> utrace_control, if the thread is in kernel, manipulating application
> text is still safe, right?

For this particular case, maybe "safe enough".  I wouldn't like to
encourage thinking of it as safe in general.  Whatever the kernel is
doing might be touching user memory, or unmapping it or whatever.  For
breakpoint insertion/removal, you are pretty much assuming nothing else
would ever touch the text anyway.  You still need an interlock against
munmap text + mmap reusing addrs in quick succession.  I suppose you
could hold mmap_sem or something.

> Not sure if a similar model can be used to address the teardown races
> problem.

Hmm.  I mentioned before having a non-synchronizing record of the
callback in progress.  This lets UTRACE_DETACH return -EINPROGRESS,
which is after the horse has left the barn, as it were.  There is a
related extension of this idea I had vaguely in mind but didn't mention.
A utrace_set_events call that disables event bits could also return
-EINPROGRESS when some callback might be in progress.  As with detach,
that means a zero return tells you all is well.  Unlike detach, if you
get -EINPROGRESS from utrace_set_events, you still have the engine.

So you know that a callback might be in progress.  That means the thread
is somewhere past where it extracted your engine's ops pointer and event
mask.  Now, utrace_prepare_examine returns zero only if the thread has
explicitly changed its ->state since then.  The utrace code path does
not do that.  So if your callbacks can't block, you know a zero return
here means that your callbacks must have finished.  

Callbacks aren't supposed to block very much anyway, but they can
technicallly block.  This is e.g. for kmalloc calls you make, which is
kosher in a callback.  But a callback of yours will never block before
you've called some might_sleep function.  So your callback can do
things like looking at engine->data before any potential blocking calls.
It can be sure the memory access done first will always have completed
before any synchronizing call to utrace_prepare_examine will return 0.

After utrace_set_events returned -EINPROGRESS, if utrace_prepare_examine
returns -EAGAIN, then you know it's really running in the kernel.  It's
in your callback or near after it.  Eventually it will block in the
kernel, or else it will get to a safe point and report.  So you could do:

        ret = utrace_set_events(task, engine, 0);
        if (ret == -EINPROGRESS) {
                ret = utrace_prepare_examine(task, engine, &exam);
                while (ret == -EAGAIN) {
                        yield();
                        ret = utrace_prepare_examine(task, engine, &exam);
                }
        }

If your callback never blocks, that is enough to know that it's safe to
free your data, unload your module, etc.  (Or if your callback can block
but what you want to synchronize with is just what it does before it
blocks.  That's when you don't care about being able to unload the
module, just juggling your own data structures.)

For the general case of some unknown callback code that might include
blocking, you could instead do:

        ret = utrace_set_events(task, engine, 0);
        while (ret == -EINPROGRESS) {
                ret = utrace_prepare_examine(task, engine, &exam);
                if (ret == -EAGAIN)
                        yield();
                ret = utrace_set_events(task, engine, 0);
                if (ret == -EINPROGRESS)
                        yield();
        }

That is: when it is blocked, check again if a callback still might be in
progress, and loop.  Assuming your callback is well-behaved and only
ever blocks for a short time (like kmalloc), this will return soon.

yield(); should actually be schedule_timeout_interruptible(1); probably,
and it should be able to return -ERESTARTSYS.

If this is a good plan, then we should encapsulate it in a new call
utrace_set_events_sync() or something like that.  So then the safe
detach procedure would be just:

        ret = utrace_set_events_sync(task, engine, UTRACE_EVENT(REAP));
        if (!ret)
                ret = utrace_control(task, engine, UTRACE_DETACH);

This can only return zero or -ESRCH.  Zero says that all callbacks are
definitely finished.  -ESRCH says that your report_reap callback is
definitely happening (right now, modulo preemption).

We could encapsulate that into utrace_detach_sync() too.

All this thinking needs to be verified.  But this is looking OK now.
None of this helps the global tracing case though.


Thanks,
Roland

Reply via email to