Please feel free to ignore this if it's no longer relevant to your work. I'm trying to catch up on the backlog of replies I owe you. I chose this one as the arbitrary cut-off before which I think I have already neglected things too long to still matter.
> If engine is detached (has utrace_detached_ops), utrace_barrier(engine) > spins until engine->ops becomes NULL. This is just wrong. Yes, this happens because get_utrace_lock returns -ERESTARTSYS and utrace_barrier checks for this and loops. I agree these long-spin scenarios would be wrong. The reason it tries to wait for "fully detached" state is that after utrace_control(task,engine,UTRACE_DETACH), @task could still be in the middle of a callback for @engine. > Suppose that utrace_control(DETACH) returns -EINPROGRESS, now we should > call utrace_barrier(). However, it is possible that -EINPROGRESS means > we raced with sys_sleep(A_LOT) doing report_syscall_entry(). Right. Perhaps utrace_barrier could do some different variant of the (utrace->reporting != engine) check. > Change get_utrace_lock() to succeed if the caller is utrace_barrier() > and ops == &utrace_detached_ops. I do not see any reason why this case > should be special from utrace_barrier's pov. It can just check > ->reporting and return 0 or do another iteration. [...] > Also, it is not clear why utrace_barrier() needs utrace->lock, > except to ensure it is safe to dereference target/utrace. Well, wouldn't that be reason enough? The comment in utrace_barrier talks about needing the lock. This corresponds to the comment in the UTRACE_DETACH case of finish_callback_report. Do you think those comments are inaccurate about what's required? > Note: we should also reconsider() utrace_barrier()->signal_pending() check. IMHO it is badly wrong to have utrace_barrier do an uninterruptible wait (even moreso since it's really a spin). If a buggy callback gets stuck blocking or spinning and fails to return promptly, then you wedge any debugger thread trying to synchronize with it via utrace_barrier. If you can't even interrupt that debugger thread, then there will really be no chance to recover from the deadlock. Thanks, Roland