> I think we should start with changing utrace_control(DETACH) anyway, > then try to improve. I'll ressend the one-liner I already showed.
Ok. > Hmm. I'll try to think more. Right now I don't really understand > how to do this correctly. I wasn't immediately sure either. > OK, finish_callback_report() and utrace_control(DETACH) can set > TIF_NOTIFY_RESUME. Right. Those utrace_resume has the report.action==UTRACE_RESUME bail-out case. So either that would change or detach would also do UTRACE_REPORT. > But what if there are no more attached engines? > Looks like, utrace_resume(UTRACE_RESUME) needs to handle this special > case. And utrace_reset() shouldn't clear task->utrace_flags, otherwise > utrace_resume/utrace_get_signal won't be called. Right. Or else tracehook_notify_resume could call utrace_resume unconditionally, but I'm not at all sure that is not worse. The original theory was that it should always be OK to have some utrace_flags bits stay set when they are "stale", because any kind of reporting pass that got enabled would hit the report->spurious case and clean the state up synchronously when it's safe. > So, probably detach should set TIF_NOTIFY_RESUME, but utrace_reset() > should do user_disable_single_step() too if no more engines. Confused. If there are no more engines but the tracee is still running, we still shouldn't do it there because it still might not be entirely safe. If the tracee is not stopped, it's only safe to call in current. > And in fact I don't understand why this is important. When it comes > to multiracing, any engine can hit the unwanted/unexpected trap > because another engine can ask for UTRACE_*STEP. Right. An engine earlier in the list could swallow the signal so the next engines' callbacks didn't see it. But it doesn't know that some later engines didn't also ask for stepping. So there would have to be some understood convention between engines. For example, a later engine could see info->si_signo==SIGTRAP et al and act on that even when the incoming utrace_signal_action(action)==UTRACE_SIGNAL_IGN. Of course, that doesn't help a non-stepping engine that is earlier in the list to know that a later engine will be swallowing the signal. The original theory on this was that we'd one day stop overloading user signals for debugger-induced traps. In some past TODO lists and postings I referred to "extension events". The idea (in part) was that things like hardware stepping would generate a special new flavor of utrace event rather than a real signal that has to be intercepted. Then engines' callbacks would easily see that this was a debugging event induced by some engine and ignore it (or more likely, just never get any callback unless your engine registered interest in stepping). This would also address the case of asynchronous engine detach just after a trap has actually hit, where today the SIGTRAP is queued and then later won't be intercepted by a debugger, and instead kills the user process. But we don't have any of that now, and don't yet know if we will really pursue any big improvements at this API level. > The only really > important (I think) case is when the last engine detaches. That's the most important case, sure. But in any case that is not actually racy, we should avoid later spurious traps. > IOW. Suppose that eninge E does utrace_control(STEP) or its callback > returns UTRACE_*STEP. If we do not detach this engine, other engines > will see the trap. That's only so if the tracee actually gets back to user mode before we have another reporting pass. > So why it is so important to clear X86_EFLAGS_TF if we detach E ? Perhaps I am worrying too much about it. The worst thing is if it could really get "stuck". But that shouldn't be possible if there are any engines at all, perhaps only any with UTRACE_EVENT(QUIESCE) set. Worst case, one spurious SIGTRAP will get to a report_signal pass, but nobody will return UTRACE_*STEP again, so there won't be another. (Of course, nobody will swallow that SIGTRAP and so it will terminate the process first anyway, but that's another problem.) What seems important is any "non-racy" scenarios. That is, where perhaps it wasn't properly stopped and E detached "asynchronously", but in practice the tracee was known to be otherwise blocked elsewhere or something, so the detach should have full effect before it returns to user mode. But that is just vague theory off hand. Thanks, Roland