On 10/15, Roland McGrath wrote: > > > Tracee, finish_callback() path: > > > > if (action == UTRACE_DETACH) > > engine->ops = utrace_detached_ops; > > > > utrace->reporting = NULL; > > > > no barries, no utrace->lock() in between. > > > > Tracer, utrace_barrier() under utrace->lock: > > > > if (engine->ops == utrace_detached_ops) > > return -EXXX; > > > > if (utrace->reporting != engine) > > return 0; > > > > Afaics, this is racy and should be fixed in utrace level. > > Let me see if I'm following you. The scenario you mean is that > utrace_barrier() returns zero when there had been a callback in > progress but it returned UTRACE_DETACH, and so now the engine is > detached. The only guarantee utrace_barrier() claims is ordering > with respect to the effects of prior utrace_* calls, and a callback > not being in progress right now. This scenario meets that interface > contract as stated. It's as if utrace_barrier() had returned zero > before that callback ran. > > But you can tell by external means that your callback has started > before you called utrace_barrier(), so you want it to return -ESRCH > instead of 0. Is that right?
I think this is right, but I'd like to be absolutely sure we understand each other, please see below. As I said, this is very similar to the problems we discussed in another thread. But this particular race can be solved (I think) if we change finish_callback_report() to set ->ops = utrace_detached_ops under utrace->lock. In fact I feel this makes sense anyway, but I need to reread this code with a fresh head. So. Let's suppose again we have something like bool please_detach; my_report_callback(...) { if (please_detach) return UTRACE_DETACH; ... return UTRACE_WHATEVER_ELSE; } The tracer does: please_detach = false; err = utrace_barrier(); if (err) // too late! return -ESRCH; // Great! we cancelled the possible detach. // the next invocation of my_report_callback() // must see please_detach == false. // if this callback was already called and returned // UTRACE_DETACH, utrace_barrier() shouldn't succeed. // Seriously, if utrace_barrier() was called when the // tracee has already started to handle UTRACE_DETACH // request inside the reporting loop, a naive user like // me would like to expect utrace_barrier() should catch // this detach-in-progress somehow. If it is not supposed utrace_barrier() can be used this way, then I don't understand any longer what did you mean in https://www.redhat.com/archives/utrace-devel/2009-October/msg00135.html > No, the return calue from utrace_barrier() does not matter. And, if the > tracee is killed we don't care. The race is different. The utrace_barrier return value should cover exactly the case you describe, ... A success return tells you positively that it was not detached. That is why initially I did please_detach = false; utrace_barrier(); // we can't trust the returned value if (engine->ops != ptrace_utrace_ops) return -EXXX; // Great! we cancelled the possible detach. // the next invocation of my_report_callback() // must see please_detach == false. But. even the "must see" above needs some barriers, afaics. Either in utrace, or the user should take care. So far I think it is better to modify utrace. Oleg.