> No. I don't see how we can fix this without some changes in utrace.

Ok.  I think we could come up with more contortions to cope with
do_signal_stop happening before the pending UTRACE_STOP processing.
But in the general case it's fundamentally a race.  It doesn't matter
if we invert the outcome of that race in some cases.  So I see nothing
really wrong with doing the utrace "pre-signal safe point" before job
control processing.  Assuming well-behaved callbacks, all that matters
is that we do the job control bookkeeping for UTRACE_STOP (which we do).

> Just in case, let me remind you what is the problem. For simplicity,
> let's ignore SIGKILL issues, and let's discuss this particular case:
> 
>       a PTRACE_O_TRACESYSGOOD tracee reports SYSCALL_ENTRY and stops.
> 
>       the tracer does
> 
>               ptrace(PTRACE_SYSCALL);
> 
>               // we must have SYSCALL_EXIT report
>               waitpid(&state);
> 
>               assert(state == 0x857F);                  <--- 1
>               assert(ptrace(PTRACE_GETEVENTMSG) == 0);  <--- 2
> 
> This must always work, both 1 and 2 asserts are valid. (but, for the
> moment, please forget about 2).
> 
> However, with utrace-ptrace the tracee can notice ->group_stop_count and
> call do_signal_stop() before it has a chance to do finish_resume_report()
> and actually stop. This breaks the 1st assert.

Though I described the general case as a race, this particular scenario
can arise with no actual racing.  During the syscall-entry stop,
userland can synchronously invoke the job control stop (SIGSTOP to
another thread, etc.)  before it ptrace-resumes the traced thread.  In
that case we know (at the utrace API level) that the job control stop in
truth precedes any later UTRACE_STOPs we provoke after resuming from the
syscall-entry stop.  But I don't have any problem with saying that the
"handle MT job control normally" semantics just always take place at the
same time/sequence where "delivering pending signals normally" takes
places.  That is, after UTRACE_SIGNAL_REPORT can choose UTRACE_STOP.

> Agreed, I thought about this too. IOW, we can just silently drop this
> JCTL stop event and report SYSCALL_EXIT instead.

Right.

> So, we add ptrace_report_jctl(notify) which does
> 
>       if (notify == CLD_STOPPED) {
>               if (get_stop_event(context)) {
>                       // pretend we we already stopped when another
>                       // thread initiated this group-stop, set the
>                       // correct ->exit_code and return, our tracer
>                       // will be notified by do_signal_stop()
> 
>                       task->exit_code = context->stop_code & 
> PTRACE_EVENT_MASK;
>                       return;

should be:              return UTRACE_STOP;

> Unfortunately, this doesn't help. Because the tracee will stop in
> TASK_STOPPED state, and this means SIGCONT can break the 2nd assert
> above.

Yes and no.  When a report_jctl hook returns UTRACE_STOP, then you
expect it to wind up in TASK_TRACED before it gets back to user mode.
It will do that here too, but not quite how you might expect.  It will
set TIF_NOTIFY_RESUME, then go into TASK_STOPPED and stay there until
SIGCONT, then wake up, run until report_quiesce(0) or report_signal
where your callback can return UTRACE_STOP again.  But that's fine.
Staying truly stopped throughout rather than waking and stopping again
differently only matters if there is an examination in progress.

So the only question is the interim where it was in TASK_STOPPED.  If
ptrace_check_attach() uses utrace_control here, then it will turn that
TASK_STOPPED into TASK_TRACED and thus exclude any SIGCONT problem.

> This is the problem. Since do_signal_stop() sets TASK_STOPPED, we
> can't report anything but JCTL event. 

It looks funny if you can tell (i.e. /proc/pid/status) that it is in
TASK_STOPPED rather than TASK_TRACED.  But aside from that, who can tell?

> And it is not possible to
> hack do_signal_stop() so that it sets TASK_TRACED if the debugger
> has changed task->exit_code (or context->exit_code or whatever).

Anything is possible! ;-) What would make sense, if anything, is to go
into TASK_TRACED instead of TASK_STOPPED when UTRACE_STOP is in use.
That seems entirely doable with another rejiggering of the tracehook
call interface there.

But all that's just as devil's advocate.  I don't see any problem with
changing the order of do_signal_stop vs utrace_get_signal as you want.

> Please see the patches. They are very simple.

Yeah, it looks about fine to me.

> Perhaps, it makes sense to do more. We can introduce the new
> UTRACE_SIGNAL_JCTL utrace_signal_action, but I don't think it is
> really needed.

I don't understand what that would mean.


Thanks,
Roland

Reply via email to