> Thinking aloud, utrace_control(UTRACE_STOP) returns -EINPROGRESS for > threads not yet stopped > a. possibly still in userspace, yet to pass through a quiesce safe point > b. blocked in the kernel on a syscall or an exception.
Correct. > Would task_current_syscall() help here? On a -EINPROGRESS return, one > can check for a 0 return from task_current_syscall() which'd mean the > thread is in the kernel. Wouldn't that mean that the thread will > eventually do a report_quiesce? You already know it will *eventually* do a report. UTRACE_STOP having returned -EINPROGRESS is roughly equivalent to you having used UTRACE_REPORT instead. (The difference is that if you have not enabled report_quiesce or whatever callback that next arises, it will stop and stay stopped anyway.) The two issues are that, if it's in the kernel, this might not be very soon, or it might be right now. If it's blocked in a syscall like poll or read, then it could be a very long time (weeks) before there is a report. On the other hand, if it's blocked in the kernel it could be just about to wake up and hit a callback event. In fact, it could even be blocked in a kmalloc call inside your callback. When you say task_current_syscall, think utrace_prepare_examine. They use the same synchronizing parts. If task_current_syscall were done in utrace terms, it would be: utrace_prepare_examine || abort syscall_get_nr syscall_get_arguments utrace_finish_examine || abort The only part you're interested in here is what utrace_prepare_examine is doing. It checks that the target is blocked (excluding simple preemption). > I think this can be a good model to use for non-perturbing quiesce for > cases as breakpoint insertion and removal or any applicaton text > modification, where one needs to ensure no thread is executing in the > vicinity of the said modification. So, even on an -EINPROGRESS on a > utrace_control, if the thread is in kernel, manipulating application > text is still safe, right? For this particular case, maybe "safe enough". I wouldn't like to encourage thinking of it as safe in general. Whatever the kernel is doing might be touching user memory, or unmapping it or whatever. For breakpoint insertion/removal, you are pretty much assuming nothing else would ever touch the text anyway. You still need an interlock against munmap text + mmap reusing addrs in quick succession. I suppose you could hold mmap_sem or something. > Not sure if a similar model can be used to address the teardown races > problem. Hmm. I mentioned before having a non-synchronizing record of the callback in progress. This lets UTRACE_DETACH return -EINPROGRESS, which is after the horse has left the barn, as it were. There is a related extension of this idea I had vaguely in mind but didn't mention. A utrace_set_events call that disables event bits could also return -EINPROGRESS when some callback might be in progress. As with detach, that means a zero return tells you all is well. Unlike detach, if you get -EINPROGRESS from utrace_set_events, you still have the engine. So you know that a callback might be in progress. That means the thread is somewhere past where it extracted your engine's ops pointer and event mask. Now, utrace_prepare_examine returns zero only if the thread has explicitly changed its ->state since then. The utrace code path does not do that. So if your callbacks can't block, you know a zero return here means that your callbacks must have finished. Callbacks aren't supposed to block very much anyway, but they can technicallly block. This is e.g. for kmalloc calls you make, which is kosher in a callback. But a callback of yours will never block before you've called some might_sleep function. So your callback can do things like looking at engine->data before any potential blocking calls. It can be sure the memory access done first will always have completed before any synchronizing call to utrace_prepare_examine will return 0. After utrace_set_events returned -EINPROGRESS, if utrace_prepare_examine returns -EAGAIN, then you know it's really running in the kernel. It's in your callback or near after it. Eventually it will block in the kernel, or else it will get to a safe point and report. So you could do: ret = utrace_set_events(task, engine, 0); if (ret == -EINPROGRESS) { ret = utrace_prepare_examine(task, engine, &exam); while (ret == -EAGAIN) { yield(); ret = utrace_prepare_examine(task, engine, &exam); } } If your callback never blocks, that is enough to know that it's safe to free your data, unload your module, etc. (Or if your callback can block but what you want to synchronize with is just what it does before it blocks. That's when you don't care about being able to unload the module, just juggling your own data structures.) For the general case of some unknown callback code that might include blocking, you could instead do: ret = utrace_set_events(task, engine, 0); while (ret == -EINPROGRESS) { ret = utrace_prepare_examine(task, engine, &exam); if (ret == -EAGAIN) yield(); ret = utrace_set_events(task, engine, 0); if (ret == -EINPROGRESS) yield(); } That is: when it is blocked, check again if a callback still might be in progress, and loop. Assuming your callback is well-behaved and only ever blocks for a short time (like kmalloc), this will return soon. yield(); should actually be schedule_timeout_interruptible(1); probably, and it should be able to return -ERESTARTSYS. If this is a good plan, then we should encapsulate it in a new call utrace_set_events_sync() or something like that. So then the safe detach procedure would be just: ret = utrace_set_events_sync(task, engine, UTRACE_EVENT(REAP)); if (!ret) ret = utrace_control(task, engine, UTRACE_DETACH); This can only return zero or -ESRCH. Zero says that all callbacks are definitely finished. -ESRCH says that your report_reap callback is definitely happening (right now, modulo preemption). We could encapsulate that into utrace_detach_sync() too. All this thinking needs to be verified. But this is looking OK now. None of this helps the global tracing case though. Thanks, Roland