Thanks, David. That is exactly the right example of using kernel synchronization primitives with callbacks to implement blocking behaviors you want. The wrinkle there is that you use UTRACE_INTERRUPT, which (potentially) perturbs the behavior of every traced thread. Doing this gives you a simple a way to do synchronous detach and avoid those races. It's a prime example of why asynchronous detach is harder and we need to hash it out. What you've done is the only thing that's straightforward to do now, but it has one of the bad old side effects of ptrace (interrupting detach) that we need to eliminate to make the facility acceptable as the basis for pervasive tracing of many processes on the system.
Thanks, Roland