> Again, I don't understand. Let me repeat, I tried to propose something
> that do not make any visible difference except fixes the problem with
> get_user_pages() under spin_lock().

Ok.

> And yes, while I don't pretend I know what should be done, I agree the
> current API (I don't mean the documentation, I mean the actual code) is
> not right.

The behavior is fine in practice, it's just that the lockdep rules are
violated (and that's indeed horribly wrong in the cases that matter, which
are vanishingly rare in real-world practice).

> I agree. But I can't explain why I agree ;) I mean, I feel this should be
> more clean, but given that I do not understand the low level details even
> remotely I can't judge.

What level of details don't you understand?  
I hope you understand everything in utrace.c!

Speaking of which, it really looks like it's already the case now that
->stopped exactly matches task_is_traced().  I know we discussed this
before and put it off.  But I'm just looking again now, and it seems fairly
clear.  Even without getting the deep logic clear in my head, it looks
pretty solid just from local inspection that they match.  Do you agree?

In fact, as I count them, 8 of 10 uses of ->stopped are purely to reconcile
the bookkeeping so it will match task_is_traced().

> OK. Unless I misunderstand "avoid always getting a callback" above, this
> means the tracee should call enable_step() after wakeup in utrace_stop(),
> yes?

Something like that, yes.

> And yes, this is not even consistent across machines (as you explained
> before), that is why I am very nervous when I think about the changes
> in this area.

Ok.  I think this is a corner where we will have some play in cleaning up
and unifying the behavior upstream without much complaint.  In the status
quo, x86 passes only 0 to tracehook_report_syscall_exit and will synthesize
a SIGTRAP only if TIF_SINGLESTEP is set on its return (i.e. the last call
user_enable_*_step was after the last call to user_disable_single_step).
Other machines either pass step=1 when user_enable_*_step had been used to
reach tracehook_report_syscall_exit and do nothing else, or pass step=0 and
have no other related checks I can see.

For us, step=1 doesn't really give any new information.  It means
PTRACE_{SINGLE,BLOCK}STEP was used to hit the syscall insn (or used after
PTRACE_SYSCALL stopped for syscall-entry).  But now ptrace_context.resume
tells us that if we check it.

So, if nothing else, we can do user_disable_single_step somewhere inside
tracehook_report_syscall_exit to disarm the x86 code and then behave
uniformly across machines inside there.  But, the x86 maintainers are
generally friendly to our changes if we are clear about their purpose.

On all those other machines with unconditional step=0, we can assume that
either they don't have single-step at all, PTRACE_SINGLESTEP over a syscall
insn is already broken (actually stops after the following insn instead of
after the syscall insn), or their hardware works differently so it doesn't
have this issue, and regardless that they never much cared about the nit.

Today's behavior on e.g. powerpc (a sometimes step=1 arch) is not the
desireable one.  Since it uses ptrace_notify, an infinite sequence of
waitpid; print status; PTRACE_SINGLESTEP, WSTOPSIG(status) will double
report any signal received during (or induced by) a syscall (because
ptrace_syscall_exit requeues it to be reported again, instead of delivering
it).  For observed behavior, the x86 one is the most desireable one in that
scenario.  For the synthetic test case where you've used PTRACE_SYSCALL to
get to the syscall-exit stop and then used PTRACE_SINGLESTEP, nobody really
cares, but the observed behavior that makes intuitive sense is not to get
any second stop without going back to user-mode--not the x86 behavior.  I
don't think anyone would object if we made the behavior consistent with
those two "most desireable"s across all machines, though no single machine
today is consistent with both.

As to the first scenario, chances are that nobody today is noticing that
detail on non-x86 machines.  Today they get a bogus syscall-exit report
when doing PTRACE_SINGLESTEP, but if not using PTRACE_O_SYSGOOD then they
just see a SIGTRAP with si_code=0.  A real single-step SIGTRAP has si_code
set, and on some machines also has si_addr.  On x86, the synthesized
SIGTRAP matches what a real single-step one would have in the siginfo_t,
which is si_code and si_addr.  So the only difficulty in just using uniform
code across machines is getting the right si_code and si_addr values.
Every machine but x86 gets 0 for those today, so a "general solution" that
starts out by only actually getting real values in for x86 would be OK.

Perhaps we could require asm/siginfo.h to do:

#define TRAP_SINGLESTEP TRAP_BRKPT /* TRAP_TRACE on powerpc, etc. */

Then require asm/ptrace.h to provide user_instruction_pointer():

static inline unsigned long user_instruction_pointer(struct pt_regs *regs)
{
        return user_mode_vm(regs) ? regs->ip : 0;
}

This is only required vs just using:
         si_addr = user_mode(regs) ? instruction_pointer(regs) : 0;
because x86 wants user_mode_vm vs user_mode.

Then arch-independent code can fill siginfo_t just right for a synthetic
single-step trap.


Thanks,
Roland

Reply via email to