Please feel free to ignore this if it's no longer relevant to your work.
I'm trying to catch up on the backlog of replies I owe you.  I chose
this one as the arbitrary cut-off before which I think I have already
neglected things too long to still matter.

> If engine is detached (has utrace_detached_ops), utrace_barrier(engine)
> spins until engine->ops becomes NULL. This is just wrong.

Yes, this happens because get_utrace_lock returns -ERESTARTSYS and
utrace_barrier checks for this and loops.  I agree these long-spin
scenarios would be wrong.

The reason it tries to wait for "fully detached" state is that after
utrace_control(task,engine,UTRACE_DETACH), @task could still be in the
middle of a callback for @engine.

> Suppose that utrace_control(DETACH) returns -EINPROGRESS, now we should
> call utrace_barrier(). However, it is possible that -EINPROGRESS means
> we raced with sys_sleep(A_LOT) doing report_syscall_entry(). 

Right.  Perhaps utrace_barrier could do some different variant of the
(utrace->reporting != engine) check.

> Change get_utrace_lock() to succeed if the caller is utrace_barrier()
> and ops == &utrace_detached_ops. I do not see any reason why this case
> should be special from utrace_barrier's pov. It can just check
> ->reporting and return 0 or do another iteration.
[...]
> Also, it is not clear why utrace_barrier() needs utrace->lock,
> except to ensure it is safe to dereference target/utrace.

Well, wouldn't that be reason enough?  The comment in utrace_barrier
talks about needing the lock.  This corresponds to the comment in the
UTRACE_DETACH case of finish_callback_report.  Do you think those
comments are inaccurate about what's required?

> Note: we should also reconsider() utrace_barrier()->signal_pending() check.

IMHO it is badly wrong to have utrace_barrier do an uninterruptible wait
(even moreso since it's really a spin).  If a buggy callback gets stuck
blocking or spinning and fails to return promptly, then you wedge any
debugger thread trying to synchronize with it via utrace_barrier.  If
you can't even interrupt that debugger thread, then there will really be
no chance to recover from the deadlock.


Thanks,
Roland

Reply via email to