On 10/12, Roland McGrath wrote: > > > If engine is detached (has utrace_detached_ops), utrace_barrier(engine) > > spins until engine->ops becomes NULL. This is just wrong. > > Yes, this happens because get_utrace_lock returns -ERESTARTSYS and > utrace_barrier checks for this and loops. I agree these long-spin > scenarios would be wrong. > > The reason it tries to wait for "fully detached" state is that after > utrace_control(task,engine,UTRACE_DETACH), @task could still be in the > middle of a callback for @engine.
Yes. But, with this patch, in this case utrace_barrier()->get_utrace_lock(attached => false) returns success and then we check utrace->reporting == engine. This patch only makes a difference if ops == utrace_detached_ops, Before this patch spin until ->ops goes away After this patch spin until ->ops goes away OR ->reporting != engine (Hmm. Probably utrace->reap = T case needs more attention, utrace_maybe_reap() doesn't set utrace->reporting). > > Also, it is not clear why utrace_barrier() needs utrace->lock, > > except to ensure it is safe to dereference target/utrace. > > Well, wouldn't that be reason enough? The comment in utrace_barrier > talks about needing the lock. This corresponds to the comment in the > UTRACE_DETACH case of finish_callback_report. Yes agreed. Honestly, I can't even recall what I meant. > > Note: we should also reconsider() utrace_barrier()->signal_pending() check. > > IMHO it is badly wrong to have utrace_barrier do an uninterruptible wait > (even moreso since it's really a spin). If a buggy callback gets stuck > blocking or spinning and fails to return promptly, then you wedge any > debugger thread trying to synchronize with it via utrace_barrier. If > you can't even interrupt that debugger thread, then there will really be > no chance to recover from the deadlock. Completely agreed. But, unfortunately, this signal_pending() check assumes the caller can always handle this ERESTARTSYS. But this is not true, one example is the exiting debugger doing exit_ptrace(). Oleg.