> E1 does utrace_control(UTRACE_RESUME), E2 does > utrace_control(UTRACE_SINGLESTEP). > How this can work? > > If E2 calls utrace_control() first, the subsequent UTRACE_RESUME does > user_disable_single_step(), and (in general) E2 has no chance to re-assert > SINGLESTEP.
I see your point. > Looks like, 26fefca "sticky resume action" fixes this... It handles that scenario. But you've actually raised a broader issue. Even with utrace-cleanup where we get the "most intrusive wins" logic, we still have a version of this issue. That is, ordering of utrace_control calls is an uncoordinated variable that does not tie in with the engine callback ordering. Another example is E1 does UTRACE_BLOCKSTEP and then E2 does UTRACE_SINGLESTEP. Now single-step will win out, which is right. But E2 gets no notice about this, so it can't know that the step is for an insn rather than a block. In contrast, say utrace_control reduces both to UTRACE_REPORT, either implicitly as we had been discussing before, or because E3 comes along (before or after) and calls utrace_control(,,UTRACE_REPORT). Then E[123] get report_quiesce(0) in their callback order. (We can't control that order much today, but that is still to consolidate the utrace_control ordering issue into that existing general ordering issue.) One easy thing relative to utrace-cleanup is to say that when a pending *STEP is changed to the other * then it just reduces to REPORT. IOW, you always get a callback after utrace_control if any other engine is doing any action different from yours (and < RESUME). If we later add something like an engine priority number that controls callback order, then it would be a compatible optimization of this to store the priority along with the pending action and have utrace_control calls reduce to REPORT only if the previous utrace_control(,,*STEP) was by an engine whose report_quiesce would have been before yours (so that it wouldn't know what you had done anyway). > Yes, I hope I understand utrace.c ;) I meant user_enable_single_step(). > I don't really understand it even in x86 case, to many low level magic. Ok, good. You officially don't need to understand that unless you want to. It's essential we have you as an expert on utrace and ptrace. But you do not also need to become an arch/x86 expert unless it takes your fancy. It's enough to know that the arch code has certain contraints (i.e. here it's __might_sleep). For all these arch interfaces, I've endeavored to make the kerneldoc comments give the complete specification of the contract between the arch code and the generic code. If some arch code does not match that specification, then it can be my problem to resolve that with the arch code and its humans. If an arch needs it to involve changing that interface contract, then it can be my problem to mediate that with you and keep the kerneldoc current and accurate. > By the time we discussed it, ->stopped was a bit more complicated iirc, > it could be set in some subtle ->exit_state cases. Yes. > Afaics, we can kill ->stopped, but utrace_stop() and utrace_finish_jctl() > should do unconditial spin_unlock_wait(utrace->lock) after resume (or > lock/unlock). Ok. Please send a patch relative to utrace-cleanup with that code including comments that explain those needs. > Of course, unlike ->stopped, task_is_traced() == T is not stable under > under untrace->lock, but I think we don't care. Right. > Yes, but sometimes ptrace (current implementation) has no chance to > notice this case and it relies on syscall_trace_leave()->send_sigtrap(), > we are going to return to user-mode without utrace->report/interrupt set. Right, it would require that tracehook_report_syscall_exit() calls into utrace even without utrace_flags & UTRACE_EVENT(SYSCALL_EXIT) so it can do something about this case. > Horror. Horror, horror, horror. > > I just realized that ptrace_resume()->send_sigtrap() can only work on x86. > I always knew this should be changed, but I hoped this more or less correct > for now. But send_sigtrap() doesn't even exist on !x86 machines! Sorry, I didn't mention that lately since I presumed you already knew. It's just a simple wrapper. With the arch-specific si_code/si_addr determination interfaces I mentioned, it would be (almost) subsumed by generic code doing the same thing anyway. The (almost) is: tsk->thread.trap_no = 1; tsk->thread.error_code = error_code; which is x86 magic that can only ever be seen in userland handler's sigcontext (not in siginfo!), and it's only a undesired misfeature that these ptrace-induced signals can ever reach userland. But, I mention it so as not to omit anything more about the status quo. > I need to read the code and think, then I'll ask you stupid questions. :-) Putting those first two steps before the third is what makes you a rock star! Thanks, Roland