> The usage of ->ptrace in utrace-ptrace is racy [...] Sure. Replace it. I'm not sure what purpose ->ptrace (or ctx->flags) serves for attach/detach. You need somewhere to store the PTRACE_SETOPTIONS state and so forth, sure. But you can probably just handle the attachedness at the utrace level. That's what UTRACE_ATTACH_EXCLUSIVE is for.
> I think we need another variable which should be used instead, and > the only "good" place where it can live is engine->data, it should > point to "struct ptrace_xxx". Of course! That was the main point of the whole ptrace-tracee data structure cleanup. > Yes, I see. That is why I named it "notify_stopped", not report_xxx, > to emphasize it is really special and should be used with care and only > when really needed. Right, I got that. > So, if we do not want to add the special (and not-a-report) hooks > into engine->ops, then perhaps it makes sense to do [...] That is exactly the kludge the bad old utrace-ptrace.patch does. > So, perhaps we can just add engine->ops->notify_stopped() (or even [...] My main point was that we should look for an entirely different way to attack the problem. If I thought this hook would only be used for ptrace, it might be fine. (OTOH then it's not all that much better than the terrible ptrace-only kludge.) But AFAICT we'd likely wind up with every other new utrace module that cares about doing synchronization with debugger threads needing to use this hook too. That's why I want to talk about the general case in the utrace API, and not just ptrace. Let's ignore SIGCHLD for a moment, and just think about ptrace's report_* waking up wait. That is like what we expect will be the general case of this: a debugger thread wants to sleep somehow and have its report_* callbacks say "wake the debugger so it can examine me". So, how does the utrace module writer build that? You'd probably use wait_event() et al in your debugger thread, and wake_up() in report_* callbacks. (That's approximately what you have in ptrace+wait.) So how much is satisfied with a "delayed wake-up" (rather than an open-ended hook). Let's say: utrace_wake_up(task, engine, wqh); // replaces wake_up(wqh); return UTRACE_STOP; // | UTRACE_SIGNAL_IGN|UTRACE_SIGNAL_HOLD That would mean, approximately, as if wake_up(wqh) and then the wakee had called utrace_barrier(task,engine), but without the sched flutter. Formally, I think the guarantee should be "when utrace_barrier would return for you". Technically this means that if your callback did not use UTRACE_STOP, it's just sometime after your callback returns. But if you did use UTRACE_STOP, then it means after the task will definitely stop. That's all it means today, but I think we want to strengthen that guarantee so that when you have used UTRACE_STOP (either in a callback already in progress when you called utrace_barrier, or in a utrace_control call before the utrace_barrier call), utrace_barrier doesn't return until "utrace_prepare_examine will work", i.e. actually in TASK_TRACED/STOPPED. In ptrace, I think this alone might suffice to cover things "cleanly" in the utrace API. (That is, clean for utrace. ;-) The ptrace layer could use the trick of a custom wake function to implement SIGCHLD without needing anything but utrace_wake_up from the utrace layer. That is at base the same thing as your notify_stopped hook--ptrace will supply a hook that runs in very special circumstances. But for the more common case in other modules, using utrace_wake_up/wait_event this way can be very straightforward and foolproof. Thoughts? Thanks, Roland