> Again, I don't understand. Let me repeat, I tried to propose something > that do not make any visible difference except fixes the problem with > get_user_pages() under spin_lock().
Ok. > And yes, while I don't pretend I know what should be done, I agree the > current API (I don't mean the documentation, I mean the actual code) is > not right. The behavior is fine in practice, it's just that the lockdep rules are violated (and that's indeed horribly wrong in the cases that matter, which are vanishingly rare in real-world practice). > I agree. But I can't explain why I agree ;) I mean, I feel this should be > more clean, but given that I do not understand the low level details even > remotely I can't judge. What level of details don't you understand? I hope you understand everything in utrace.c! Speaking of which, it really looks like it's already the case now that ->stopped exactly matches task_is_traced(). I know we discussed this before and put it off. But I'm just looking again now, and it seems fairly clear. Even without getting the deep logic clear in my head, it looks pretty solid just from local inspection that they match. Do you agree? In fact, as I count them, 8 of 10 uses of ->stopped are purely to reconcile the bookkeeping so it will match task_is_traced(). > OK. Unless I misunderstand "avoid always getting a callback" above, this > means the tracee should call enable_step() after wakeup in utrace_stop(), > yes? Something like that, yes. > And yes, this is not even consistent across machines (as you explained > before), that is why I am very nervous when I think about the changes > in this area. Ok. I think this is a corner where we will have some play in cleaning up and unifying the behavior upstream without much complaint. In the status quo, x86 passes only 0 to tracehook_report_syscall_exit and will synthesize a SIGTRAP only if TIF_SINGLESTEP is set on its return (i.e. the last call user_enable_*_step was after the last call to user_disable_single_step). Other machines either pass step=1 when user_enable_*_step had been used to reach tracehook_report_syscall_exit and do nothing else, or pass step=0 and have no other related checks I can see. For us, step=1 doesn't really give any new information. It means PTRACE_{SINGLE,BLOCK}STEP was used to hit the syscall insn (or used after PTRACE_SYSCALL stopped for syscall-entry). But now ptrace_context.resume tells us that if we check it. So, if nothing else, we can do user_disable_single_step somewhere inside tracehook_report_syscall_exit to disarm the x86 code and then behave uniformly across machines inside there. But, the x86 maintainers are generally friendly to our changes if we are clear about their purpose. On all those other machines with unconditional step=0, we can assume that either they don't have single-step at all, PTRACE_SINGLESTEP over a syscall insn is already broken (actually stops after the following insn instead of after the syscall insn), or their hardware works differently so it doesn't have this issue, and regardless that they never much cared about the nit. Today's behavior on e.g. powerpc (a sometimes step=1 arch) is not the desireable one. Since it uses ptrace_notify, an infinite sequence of waitpid; print status; PTRACE_SINGLESTEP, WSTOPSIG(status) will double report any signal received during (or induced by) a syscall (because ptrace_syscall_exit requeues it to be reported again, instead of delivering it). For observed behavior, the x86 one is the most desireable one in that scenario. For the synthetic test case where you've used PTRACE_SYSCALL to get to the syscall-exit stop and then used PTRACE_SINGLESTEP, nobody really cares, but the observed behavior that makes intuitive sense is not to get any second stop without going back to user-mode--not the x86 behavior. I don't think anyone would object if we made the behavior consistent with those two "most desireable"s across all machines, though no single machine today is consistent with both. As to the first scenario, chances are that nobody today is noticing that detail on non-x86 machines. Today they get a bogus syscall-exit report when doing PTRACE_SINGLESTEP, but if not using PTRACE_O_SYSGOOD then they just see a SIGTRAP with si_code=0. A real single-step SIGTRAP has si_code set, and on some machines also has si_addr. On x86, the synthesized SIGTRAP matches what a real single-step one would have in the siginfo_t, which is si_code and si_addr. So the only difficulty in just using uniform code across machines is getting the right si_code and si_addr values. Every machine but x86 gets 0 for those today, so a "general solution" that starts out by only actually getting real values in for x86 would be OK. Perhaps we could require asm/siginfo.h to do: #define TRAP_SINGLESTEP TRAP_BRKPT /* TRAP_TRACE on powerpc, etc. */ Then require asm/ptrace.h to provide user_instruction_pointer(): static inline unsigned long user_instruction_pointer(struct pt_regs *regs) { return user_mode_vm(regs) ? regs->ip : 0; } This is only required vs just using: si_addr = user_mode(regs) ? instruction_pointer(regs) : 0; because x86 wants user_mode_vm vs user_mode. Then arch-independent code can fill siginfo_t just right for a synthetic single-step trap. Thanks, Roland