Re: [PATCH 74-78] implement the partly detached engines

Oleg Nesterov Mon, 12 Oct 2009 03:27:59 -0700

On 10/11, Roland McGrath wrote:
>
> I took it from the lack of Signed-off-by: lines that you probably didn't
> want me to merge these.


Ah, no. I never added mine sob's in these long series, because I thought
it doesn't matter.

> Instead I applied them all to a new branch
> utrace-ptrace-reuse-engine:
>       magilla 135 % git l utrace-ptrace..utrace-ptrace-reuse-engine
>       85f8b3a detach should reset the context of self-detaching engine
>       a27233a attach: try to re-use the self-detaching engine
>       8667615 ptrace_notify_stop: fix engine leak
>       3d5d053 ptrace_detach_task: don't use engine ptr before IS_ERR(engine)
>       01875c7 fold detach_signal() into ptrace_detach_task()
>       464c2b7 don't detach the engine with the parting signal
>       97b345c implement the basic detach-with-signal logic

OK, great.

> I found it far easier to read utrace-ptrace..utrace-ptrace-reuse-engine as
> a single diff.  Some miscellaneous comments:

Yes, as usual the changes were chaotic. I tend to send the changes even if
I know they should be redone/cleanuped later, to get your early review and
to discuss asap.

> +                     if (valid_signal(sig))
> +                             context->signr = sig;
> +                     context->options = 0;
> +                     context->eventmsg = 0;
> +                     context->stop_code = 0;
> +                     /* make sure do_wait() can't succeed */
> +                     tracee->exit_code = 0;
>
> Since you allocate struct ptrace_context with kzalloc, it would seem
> cleanest here to use memset to clear it all first.  The only thing you
> don't clear explicitly here is siginfo.  You need it preserved for the
> resume_signal logic to use.  But IMHO it's cleaner to make that explicit in
> some comments and extract and restore it around the memset.  That should be
> the one difference in the data structures between fully detaching and then
> doing a separate full attach, right?

Mostly yes.

But I think it would be much better to move this re-initialization into
ptrace_attach_task(), in this case we can't do memset(0) + restore siginfo.

> Also it feels like there should be a way to consolidate that switch somehow
> with the first one in do_ptrace_resume, so e.g. the syscall->send_sig logic
> is all in one place.

I thought about this too. Currently I don't see how to do this cleanly.
This all needs some obvious cleanups, then we will see if we can cleanup
the code more and do some consolidation.

> +             if (context->resume == UTRACE_DETACH) {
> +                     /* we rely on ->cred_guard_mutex */
> +                     context->resume = UTRACE_RESUME;
>
> Ok, so this means that when ->resume == UTRACE_DETACH then the only other
> thing that could change ->resume is another thread in the same code path.
> That path olds cred_guard_mutex, so it excludes other threads.  Right?

Yes. Other callers of ptrace_attach_task() are ptrace_traceme() and
ptrace_clone_attach(), they can not hit this case.

> +                     /*
> +                      * Make sure we don't race with ptrace_report_signal()
> +                      */
> +                     utrace_barrier(tracee, engine);
> +                     if (engine->ops == &ptrace_utrace_ops)
> +                             goto finish;
>
> I really don't understand this part.  Firstly, what's the race?  Secondly,
> what is that ->ops check about??  That is clearly not something that the
> ptrace layer ought to be fiddling with.  If what you mean to detect here is
> that the tracee died or was detached while we waited for things to calm
> down, then you should just check the return value from utrace_barrier.

No, the return calue from utrace_barrier() does not matter. And, if the
tracee is killed we don't care. The race is different.

So. The tracee reports the signal and stops.

The tracer detaches, sets ->resume = UTRACE_DETACH and wakes up the
tracee.

A new tracer attaches, and changes ->resume back to UTRACE_RESUME. But,
since the tracee is not stopped, this can be too later.

It is possible that the tracee has already returned UTRACE_DETACH | ...
from ptrace_report_signal(), in this case finish_callback() will mark
this engine as detached soon or it has already detached it. That is
why we do:

        context->resume = UTRACE_RESUME;
        utrace_barrier();
        if (engine->ops == &ptrace_utrace_ops)
                SUCCESS!

If ptrace_report_signal() has already seen resume = UTRACE_DETACH,
this engine should be detached after utrace_barrier().

If ptrace_report_signal() was not called yet, it must see UTRACE_RESUME
when it will be called.

> (That will handily avoid the new __must_check warning you should be seeing
> for calling utrace_barrier without checking its return value, too!)

Argh. I didn't even know it is __must_check! OK, will fix... Not that
I personally like these __must_check's though ;)

> I know we went around on this idea before, and we don't need to rehash the
> idea if you still don't like it.  But the hair like clearing out the data
> structure for this approach makes me think the "second special engine"
> approach really might look better.  That is, detaching with a signal
> attaches a new engine with a different ops vector and then detaches the
> real ptrace engine.

Agreed, this all is very subjective. I hate the idea to attach another
engine here, but can't prove it is really bad. Firstly, I dislike it
because it is very unlikely another tracer will attach before the tracee
handles this signal, a bit ugly we should arrange the new engine "just in
case". And in theory kmem_cache_alloc() can fail, not good.

But mostly I dislike the fact we must teach old and new engines to cooperate.

> Then on successful attach, you do the look-up for any
> ptrace_detached_ops engine, transfer over its ctx->signr and ->siginfo to
> your fresh ptrace_context, detach it and free the old context.  It has more
> data structures live momentarily than you really need, but it seems cleaner.

Again, this is subjective, but I'd like to avoid these complications.

And. probably I missed something, but it is not trivial to implement.
A lot of problems to solve. Just for example, when we attach the new
engine, its ->flag has no UTRACE_EVENT(SIGNAL) yet. If we steal ->siginfo
we can "lose" this signal. If we set UTRACE_EVENT(SIGNAL) first, the new
and old engine can race.

Yes I thought about this, but afaics we have to fight with multiple annoying
races. Again, again, I do not pretend I am sure this is really true.

Oleg.

Re: [PATCH 74-78] implement the partly detached engines

Reply via email to