> Issues with ptrace_check_attach(), > > - it does utrace_control(UTRACE_STOP). > > This is wrong, ptrace_check_attach() must be "passive", > while utrace_control(UTRACE_STOP) can actualy stop the > tracee.
This is not inherently problematic on its own. I'd say it's OK if you have to actually be the tracer task to get that far, and of course if you call utrace_control(UTRACE_RESUME) to reverse it in the failure case. The UTRACE_STOP itself does not have any effect that the tracee can observe except as scheduling flutter. (I'm not really counting it cooperating with another process that does "ps" and sees "T"--we can live with that too.) If the actual tracer making stupid use of ptrace calls that will wind up failing with ESRCH has the side effect of lousing up the scheduling of the tracee it's failing to otherwise molest, so be it. It's a security issue if you can do UTRACE_STOP on a task that you would not be allowed to ptrace, even for an instant. It's at least quite undesireable if you ever do that on a task other than one you do actually have attached for ptrace. If we can avoid those, don't worry about the rest. For the tracer itself, any UTRACE_STOP is "passive enough" if compensated with a UTRACE_RESUME. Just stay away from UTRACE_INTERRUPT and you are being "passive" by my lights. > - even if it doesn't, this is wrong when we have multiple > tracers. ptrace_check_attach() should not succeed if the > tracee was stopped by another engine. Yes. UTRACE_STOP state is necessary for ptrace to consider the tracee to be "in ptrace stop", but it's not sufficient. Ensuring UTRACE_STOP is the mechanism for the underlying material effect that "in ptrace stop" means, i.e. in (or on the way into) TASK_TRACED until SIGKILL. Since we use UTRACE_STOP in report returns to implement ptrace stops, this is normally already the case in almost all the situations where ptrace_check_attach should succeed. The only reason we have to use utrace_control to ensure that's the case is for a tracee in TASK_STOPPED (job control stop), which we have to convert into being a guaranteed ptrace stop SIGCONT won't wake (i.e. UTRACE_STOP, i.e. TASK_TRACED). So, we use the UTRACE_STOP interlock for that purpose. But it's not what we should use for "are we in a ptrace stop?" bookkeeping, which previously was implied directly by task_is_traced(). What might make sense as the indicator of "in ptrace stop" is "ctx->stop_code != 0" or "ctx->resume == UTRACE_STOP". > - it ignores "int kill". That's for PTRACE_KILL. This is a story similar to your question about PTRACE_DETACH,SIGKILL. That is, it's actually pretty useless but we still don't want to break some arcane assumptions. In actual utility and reliability (and implementation), PTRACE_KILL is exactly the same as PTRACE_CONT, SIGKILL. There is just no point to it at all. But, people over the years have been understandably confused by ptrace, not to mention even buggier states during its checkered past, and have used whatever seemed to work at the time, and we get to bend over pretty far to avoid breaking those people's expectations now. The manifest reality of PTRACE_KILL is that it doesn't check for stoppedness, only attachedness, and so returns "success" (while doing absolutely nothing useful!) on an attached task that is not stopped or is already dead. The former is extremely useless, but the latter is actually harmless and probably behaves predictably today. I will bet that there is some code somewhere doing: if (kill(pid, SIGKILL)) fail_loudly(); usleep(100); if (ptrace(PTRACE_KILL, pid)) fail_loudly(); or equivalent stupid things. Today this will succeed every time and the ptrace call will never do anything at all. By the time ptrace is called, the task is already dead, and so there is nothing to do, but PTRACE_KILL succeeds anyway because of the ptrace_check_attach flag. So I think you've got to keep doing that. I don't think it should be particularly difficult. The first part of the ptrace_check_attach work is to ensure that it's attached and you're the tracer. The second part is to ensure that it's stopped (including converting a job control stop into a ptrace stop). With the kill flag, you just omit the second part. > Can we ignore these problems in V1 ? Or should we fix something? It doesn't seem like it should be very hard to get this right from the start. I don't think we can submit something upstream that has any known regressions even in stupid cases, except the explicit intended "improvements" that we will document. Thanks, Roland