I've been writing a ptrace based sandboxing tool, called sydbox¹, and I
want to explain about some of my bad experiences with ptrace and whether
utrace will fix these deficiencies.
It sounds like you are doing something very similar (from the utrace/ptrace
requirements perspective, at least)
Sorry for the delay. I merged those 3 patches.
Thanks,
Roland
Add int options into struct ptrace_context. Will be used to hold
PT_XXX options. Currently is not used, but:
IMHO the traditional PT_* bit assignments are useless.
We should just store and use the PTRACE_O_* bits directly.
- introduce __ptrace_set_options() helper which updates -options
I need to re-check this, but iirc this hook is not needed at all since
[PATCH 10] ptrace_report_clone: rework auto-attaching.
It certainly should never be needed in the clean version of reality.
Thanks,
Roland
Simple answer. Because I do not know how to implement this. At least now.
I tried to think of this more, but I don't see how to make the first steps.
(Yes, to be honest, this looks like unnecessary complication to me, I have
to admit. But this is not the reason.)
Well, I am befuddled. I
ptrace_attach_task:
engine = utrace_attach_task(CREATE | EXCLUSIVE);
err = utrace_set_events();
WARN_ON(err !tracee-exit_state);
Looks correct but it is not. utrace_attach_task() can return EINPROGRESS.
utrace_set_events() can, yes. I think the old code just
Btw, is it allowed to use utrace_control(UTRACE_DETACH) from -report_any() ?
If yes, then we need a bit more fixes...
It's kind of a stupid thing to do, but yes, it should be allowed.
I think it's fine either if it always returns -EINPROGRESS or if
it returns 0 for the target==current case. It
I think this should work, but I'd like to re-read these changes carefully,
this all is subtle.
Sure. I meant it as a proposal for you to consider. I didn't think
through all the corners myself. It just seemed easier to do those commits
as the starting point for further discussion than to
But report/interrupt/etc are all bitfields :1. Doesn't this mean compiler
can do anything it wants with the word where these bitfields live? Can't
it temporary (say) clear -interrupt while setting -report ?
I think the story on this is that, yes, technically the C standard doesn't
rule that
No functional changes, preparation.
I don't see how this helps prepare for anything in particular.
When this path is really relevant at all, we'll just remove ptrace_signal.
Change get_signal_to_deliver() so that ptrace_signal() is called even
if utrace returns signr != 0.
I think this is
[Sorry for the bogus reply, finger slipped!]
The logic is correct. Just I think it would be tidier to avoid setting
utrace-stopped when the task is dead. It doesn't hurt, but it is not
needed (and _looks_ confusing imho).
We can just return true, this is enough for the caller.
The theory
Two questions:
1. do you think this _can_ work as a first version of
utrace-ptrace?
work means: could look as utrace-ptrace for 2.6.32,
then we make incremental changes on top.
I'm pretty doubtful. I'd like to hear other opinions, and I'm still 100%
Introduce xxx_utrace_stop() which notifies engine we are going to stop.
It calls report_quiesce(0), but report.action = UTRACE_STOP instead of
UTRACE_RESUME.
I don't think this is necessary to go along with the ptrace cooperation
hacks. For any engine, there could always be a later engine
I believe the comment and unlikely above are not right.
Neither utrace_control(STOP) nor finish_callback(action = STOP) set
ENGINE_STOP in -utrace_flags. They change engine-flags only, and
ENGINE_STOP can only migrate to -utrace_flags after utrace_reset().
Right. Let's take a step back,
Oops, sorry about that. I've tweaked it so it should be right now,
and also still avoids the compiler warnings.
Thanks,
Roland
I do not know what to do with ptrace_detach()-wake_up_process().
It and ptrace_resume()-wake_up_process() are both clearly wrong in
general. I think all we really need to worry about is making sure that the
ptrace-tests suite has tests that express all the assumptions and
requirements of
Why does utrace_signal_handler() set -signal_handler when !stepping?
If I understand correctly, the logic is
if (stepping)
make sure that before return to user-space
UTRACE_SIGNAL_HANDLER will be reported.
This is understandable. More or
Another question. How can -report_signal(info) change info-si_signo
(or other data in *info) when we have multiple engines? I mean, how
can engines cooperate if they want to change -si_signo.
This is just an instance of the general subject of engine cooperation when
engines do perturbation.
(I've dropped Srikar's address from the CC since it bounces.
I hope he's reading the list.)
Unfortunately, this doesn't help if a module doesn't use utrace_control()
but just returns UTRACE_SINGLESTEP/etc. Like Srikar's test-case does.
Yes. I thought about adding a WARN_ON or something. But
Yes. But we attach to parent-parent, not to parent == current.
Ah yes, I see your point.
In short: ptrace_report_clone() should copy the tracing state from
current to child when needed. And it should never attach if current
is not traced.
Correct.
In particular, this means that
I wonder if utrace could be more helpful here. If an engine just
wants to single-step from some other callback, forcing it to have a
quiesce callback just to re-re-re-repeat what it already asked for
seems like make-work. Could utrace instead remember the last
instruction received either
(or even just report_quiesce
when passed UTRACE_EVENT(DEATH)) that always detaches.
Hmm. Not sure I understand how can we do without hooking death or reap...
UTRACE_EVENT(QUIESCE) constitutes hooking death.
A report_quiesce (UTRACE_EVENT(DEATH)) call precedes the report_death call.
Does this mean you intend to abandon the ptrace-revamp draft series that
added ptrace_context et al cleanups in the pre-utrace code? Or are you
culling bits from there to integrate differently now?
It should be used for ptrace_set_action() and ptrace_set_stop_event(),
the current usage of
I merged 7 and 8. My review comments on these are just to raise things to
think about between now and the final form. I don't intend them to slow
down the progress of your current hacking.
Thanks,
Roland
it check event bits for UTRACE_STOP or
UTRACE_INTERRUPT. While it's usual practice to use these in
concern with a report_quiesce callback as with UTRACE_REPORT
et al, there are meaningful ways to use these without callbacks.
Signed-off-by: Roland McGrath rol...@redhat.com
---
kernel/utrace.c | 21
Now. we should make sure that every user of UTRACE_DETACH frees
engine-data. And, we have to introduce ptrace_report_reap() which
should free it too.
Right, or you can have a report_death callback (or even just report_quiesce
when passed UTRACE_EVENT(DEATH)) that always detaches. In that
Btw. Roland, unless utrace_get_signal() does dequeue_signal(info),
we pass a random value in info-si_signo to -report_signal().
Yes, _if_ I understand correctly, report_signal can check action,
but perhaps it makes sense to set -si_signo = 0, this may help to
engine writers.
This is really
Because you forgot to add QUIESCE to utrace_set_events's mask !!!
D'oh!
Perhaps it makes sense to have utrace_control() fail with -EINVAL or
something when you try anything but RESUME or DETACH when you do not have
UTRACE_EVENT(QUIESCE) set.
Thanks,
Roland
Agreed, this looks like a right change to me. In any case I think this makes
the code more understandable and logical, and -resume_action is much better
than -reports imho.
Ok.
But, you seem to forget to fix other callers of utrace_stop(),
Yes, it was not a complete patch, just to get your
Hmm. I think this is not exacly right. Suppose that
report-action == UTRACE_INTERRUPT but utrace-interrupt is already set.
In that case, since UTRACE_INTERRUPT = UTRACE_REPORT, we set -report.
Not really bad, but not right anyway.
Ah, true. It enables harmless but superfluous work.
I've
Alternatively, we could just fix utrace_stop() to always check
fatal_signal_pending() before return.
I don't think it makes sense to roll in the check and have a return value
for it unless either it's doing important synchronization there, or every
caller needs the check.
Oh, but it does
Oh, I misremembered the difference between fatal_signal_pending() and
__fatal_signal_pending(), was thinking it had to do with siglock.
I've merged all three patches now.
Thanks,
Roland
The intent is that if you asked for UTRACE_SINGLESTEP (or anything else but
plain UTRACE_RESUME), you should be guaranteed another callback after a
stop and resumption, i.e. after any time that the resume action you asked
for was implicitly lost.
Perhaps the following patch fixes it?
Thanks,
Not only UTRACE_SINGLESTEP... UTRACE_REPORT can't be lost, we have
utrace_report-reports. But what about UTRACE_INTERRUPT ?
IOW, should any callback ever return (say) UTRACE_INTERRUPT instead
of utrace_control(UTRACE_INTERRUPT) ?
Indeed I think this is handled wrong in the current code.
I don't understand why this is better than just changing -ops. Except,
yes, I agree with above.
Sure, if you think about the innards they amount to about the same thing.
After PTRACE_DETACH, there is one engine attached and it uses the special
ops vector. They are really just different ways to
I thought I check on xtensa having something on deck.
I'd like to test it if possible to avoid surprises and
if it's not ready, and needs to be, then to include this
while submitting SMP changes in the not too distant future.
Sorry, that was a dim recollection of mine and it appears to have
What I meant, we can rely on the fact that any wakeup (try_to_wakeup()
actually) must be a barrier wrt schedule().
Ok.
But what if the tracee is woken by CONT/KILL? Should the tracee
see the last change in -stopped? The signal can race with
utrace_wakeup() or utrace_do_stop(), [...]
It
Actually the very first for utrace-devel only version should be just
your utrace-ptrace.patch + attach/detach fixes. Not that I can verify
this, but I hope that ptrace_utrace_ops's methods are more or less right.
(but of course, we should recheck them as much is possible).
The utrace_ops
This series is an RFC for utrace-based non-disruptive application core
dumper. Per Roland McGrath, this is possibly one of the common use-cases
for utrace.
I'm not sure whether it's the common use-case (debuggers are still that,
I think). What I've noticed is that this is a feature that many
Of course, we can have the false positive. Even simpler: by the time
finish_utrace_stop() takes utrace-lock the tracer can clear -stopped
even if it was really set after schedule().
Right. It didn't even occur to me that the only scenario I thought of was
a false positive. I was just
Not some, just that one, right?
Not just one. tracehook_tracer_task, tracehook_notify_jctl.
Ah. tracer_task is an outlier we can think about separately a little
later. The notify_jctl case is another in the same bag of wait/SIGCHLD
things that I had in mind.
The problem is, I can't
Confused. This patch has no effect.
Indeed, the three patches changing do_signal_stop() have no immediate
material effect. They all just clean up the code and prepare the hooks
that utrace (or something like it) needs to make tracing stops mesh well
with job control stops. I'd thought it
finish_utrace_stop() can check -stopped lockless. It was set by us,
we can't miss it.
We enter utrace_stop() for some stop. Either then or later, a group jctl
stop finishes and sets SIGNAL_STOP_STOPPED. Later, utrace_wakeup() sees
that and we switch to TASK_STOPPED after clearing -stopped.
get_utrace_lock() checks -state != EXIT_DEAD to make sure it safe
to use -utrace. This is unneeded since -utrace was embedded into
task_struct. If we can read -state, we can read -utrace as well.
I see. My immediate reaction to this was that it should have more comments
that make clear it's
Acked-by: Roland McGrath rol...@redhat.com
Perhaps we need more changes here, but I think this one is most important.
Agreed, we definitely do. For the generic case it needs to complete what
do_signal_stop would do. That is, do_notify_parent_cldstop and maybe also:
current-exit_code = sig-group_exit_code;
Though it might be now
Questions: should I send the change in signal.c to Andrew right now?
It can be applied separetely, it doesn't break utrace_report_jctl().
Yes, send it now. It might be best just to send a replacement
for signals-tracehook_notify_jctl-change.patch that rolls this in.
I wouldn't mention
Introduce the empty inline tracehook_finish_jctl() helper called by
do_signal_stop() after wakeup.
You could possibly roll this into the tracehook_notify_jctl rework too.
Or send it upstream separately, either way.
+/**
+ * tracehook_finish_jctl - report about return from job control stop
+ * While in TASK_STOPPED, we can be considered safely
+ * stopped by utrace_do_stop(). Clear -stopped if we
+ * were woken by signal.
s/signal/SIGKILL/, no? It's not really while in TASK_STOPPED any more,
it's that utrace_do_stop could have changed TASK_STOPPED to TASK_TRACED.
utrace_do_stop() sets utrace-stopped but leaves the tracee in TASK_STOPPED
state. This means SIGCONT can wake up the tracee and fool the tracer.
IMHO this one belongs before 2/4 and 3/4 and all the comments changed to
reflect that SIGKILL is the only case. But the incremental order really
I'd prefer to send it separately, just to avoid the how this hook is
connected to subtle changes in do_signal_stop questions/discussion.
Fine, it's up to you and Andrew.
Hmm... shouldn't we move this comment above schedule() ? And, perhaps,
/* Now we don't run again until continued
OK, I'll send the first patch upstream, then I re-send 2-4 with updated
comments to you.
Ok, good. When I get those I'll merge those upstream-bound ones into the
tracehook git branch (now just signals-tracehook_notify_jctl-change.patch)
and merge the utrace changes into the utrace branch.
But I think it would be really nice to do this cleanup before sending
the next version of utrace.
Let's get through all the utrace cleanups we know we need before we get there.
Thanks,
Roland
Roland, I'd prefer to send this change separately. Otherwise it will
be really hard to review the unified patch.
Do it however you think is best for getting all the generic signal.c and
tracehook.h bits merged ASAP.
And I think this will complicate its way to Linus's tree. I _think_
that
Good. In any case I believe utrace_get_signal() and utrace_report_jctl()
should not play with -stopped. This really simplifies the code and
what -stopped == T means.
I am open to any such sort of clean-up. If you change something that had a
special rationale, I'll explain what it was so we
I meant ptrace_resume()-task_is_stopped() check in ptrace_utrace.patch.
Oh. I think that was an attempt at ptrace bug-compatibility.
Like I keep saying, don't take any code in that patch seriously.
Yes, the whole-process stop will be delayed until the debugger
wakes up the tracee and it sees
utrace_wakeup:
if (likely(task_is_stopped_or_traced(target))) {
if (target-signal-flags SIGNAL_STOP_STOPPED)
target-state = TASK_STOPPED;
else
wake_up_state(target, __TASK_STOPPED | __TASK_TRACED);
}
Assuming you agree with this change... I don't know how it should be merged.
Probably the change in signal.c should be sent separately, but this breaks
-mm tree.
The relevant -mm differences are just in one patch that folds finish_stop
into do_signal_stop, right? If I can apply just that one
But currently ptrace can wake up the tracee and then later it can be
ptrace_stop()'ed again, in this case we can decrement -group_stop_count
twice for the same thread.
The behavior or the bookkeeping need to change so they are consistent.
OK, forget about mt issues. Do you really mean
Ah. I forgot signals-tracehook_notify_jctl-change.patch is still pending in
-mm.
Perhaps we should just rejigger all these together into one new patch (or
two, whatever) before akpm submits any of them.
Or you can just merge these 2 patches, perhaps this would be better.
As long as we have
I still think EXIT_DEAD check must die ;)
In get_utrace_lock, you mean? Do you mean it's superfluous because we can
rely on utrace_reap having cleared engine-ops before it matters to us?
Yes. Perhaps I missed something, but this check buys nothing. First of
all this check is racy
For example, wait_consider_task(). Even some tracehooks, say,
tracehook_notify_death() need task_is_ptraced().
Not some, just that one, right? So these are all really the same one
thing: ptrace interfering with normal parent wait/SIGCHLD.
In previous iterations, we had tracehook_inhibit_wait
Yes, I think you're right. I think the if (unlikely(utrace-stopped)) {
case in utrace_get_signal() needs to do some variant of utrace_reset. If
any engine_wants_stop() then it should not dequeue a signal even if one is
ready, and probably it should also not do any spontaneous report.
Why utrace_report_jctl() and utrace_get_signal() have to clear -stopped?
Because unlike utrace_stop(), do_signal_stop() does not do
finish_utrace_stop().
Correct.
This means that utrace_report_jctl() and utrace_get_signal() might be
called with -stopped == true, -stopped can be set by
serves for attach/detach. You need somewhere to store the
PTRACE_SETOPTIONS state and so forth, sure. But you can probably just
handle the attachedness at the utrace level. That's what
UTRACE_ATTACH_EXCLUSIVE is for.
Yes. As for -ptrace, I think it should die. The only problem is
engine-ops = NULL;
+ engine-flags = 0;
list_move(engine-entry, detached);
I think this makes sense regardless.
Agreed.
+* Make sure all engine-flags and engine-ops updates are done
+* before the current task_struct might really die.
+
First of all, I believe ptrace_check_attach() is buggy.
Yup, sure is! As I mentioned before, utrace-ptrace.patch is a source of
couple of ideas, but I don't think you should take it as a first draft
to work on and fix. Just attack implementing ptrace fresh in light of
all the ptrace cleanup
The usage of -ptrace in utrace-ptrace is racy [...]
Sure. Replace it. I'm not sure what purpose -ptrace (or ctx-flags)
serves for attach/detach. You need somewhere to store the
PTRACE_SETOPTIONS state and so forth, sure. But you can probably just
handle the attachedness at the utrace level.
The utrace_*_examine calls have been neither used nor examined much at all
(no pun intended). So both the API and the implementation for these can
use a fresh look and reconsideration. Also, this is somewhat intimately
tied up with the broader synchronization picture. So it might make most
If -report_xxx() returns UTRACE_STOP the tracee will do utrace_stop
eventually. But how can the tracer know the tracee is already
TASK_TRACED/-stopped?
We don't really have a way. It is something we need to address better.
This is the most essential thing that made the old
To what extent is the regset stuff supposed to tolerate such
mismatched data?
It ain't. We don't burden the arch code with the overheads and the
exacting robustness demands of checking for bogus parameters. (This is the
clear right choice for the arch layer, but that is separate from the
Thanks! I'm very glad to finally see this ironed out by someone who
actually knows about powerpc innards.
Thanks,
Roland
The second patch ptrace: do not use task_lock() for attach has nothing
to do with utrace, and it is really pure ptrace cleanup.
Indeed.
But it can't be applied to -mm tree, because it (textually) conficts with
utrace changes in ptrace_attach().
Oh, -mm. I had not thought about the -mm
Staging the utrace patch at end-of-series would make sense if utrace is
not on track for a 2.6.31 merge.
And afaict, this is indeed the case - things seem to have gone a bit
quiet on the utrace front lately.
I don't think that is really accurate. There has been a lack of any
reviewer
Thanks. I put that in but rewrote the new paragraph.
Roland
Certainly, in general. But in this specific test, only the under-test
system calls occurred in essnetially the whole system, so the overhead
measurements were in a way the bare minimums imposed by the kprobes
vs. utrace callback infrastructure itself.
Yes. That's why I meant to explain how
I followed your advice and looked at the registers. Here is what I found:
The function that behaves inconsistently is _dl_start in ld-2.8.so. Most
of the time the first variation in the flow occurs as early as 296
instructions down the road, namely at the jump
30b3b0:0f 86 d7 fd ff
I can't reproduce any such variation using 2.6.27.21-170.2.56.fc10.x86_64
myself. Off hand, it seems more likely there is some authentic variation
between runs for whatever reason than that this has something to do with
ptrace.
Have you tried making your program look at the tracee's PC every
IOW, 2 threads T1 and T2. T2 forks the child C. T1 ptraces C. C dies
and becomes EXIT_ZOMBIE. It sends the notification to thread-group.
Then, any thread does do_wait(). But since ptrace_reparented() = T
we don't release C but send the notification again. This doesn't
look right.
Two threads T1 and T2, and some process X (not the child of T1/T2).
T1 ptraces X.
X exits and becomes zombie.
T2 calls do_wait(WEXITED) and then wait_task_zombie(), it sets EXIT_DEAD
and drops tasklist.
T1 exits, calls exit_ptrace(). __ptrace_detach() does __ptrace_unlink()
and
Then the short-circuit test is simply if (!tsk-ptrace_info).
But, from the correctness pov, this doesn't differ from checking
!list_empty(-ptraced) ? I mean, the condition can be changed under us
in both cases, and we seem to have the same corner cases.
I guess it just seems simpler to me.
But, I still think we should do this fix before introducing -ptrace_mutex.
Ok by me if it's in fact (incrementally) simpler that way.
OK, we should avoid taking tasklist for writing. Then we should check
ptrace_reparented() first. If it is true get_task_struct, drop taslist,
take it for
After talking with Renzo Davoli we agreed that it makes most sense for
SYSCALL_ENTRY events to have engines' callbacks made in the reverse of the
normal order (FIFO vs LIFO). This ordering makes it sensical to have
nesting mutators and examiners as different utrace engines all watching
syscall
Renzo and I met at the conference in San Francisco last week and spoke more
about his use case. He showed me a fine demo live of what his cool hacks
can do, and a clear demonstration of the nesting issue. Rather than
responding point by point, I'll follow up momentarily on the API change I'm
So. We are going to make a separate, dynamically allocated structure
for tracees. Say, we add struct ptrace_child *ptrace_child into
task_struct.
Right.
attach/attachme do kmalloc() and use task_lock() to avoid races.
(with the current locking write_lock(tasklist) alone is enough).
Sure.
I tried to think about the first steps in ptrace-cleanup, and I
need your help.
I think I said that list was not necessarily in this order and I
certainly meant it so. I hope you haven't been slowed in proceeding on any
of the pieces that are more straightforward and can be done mostly
The patch only implements it for server/classic processors, not BookE,
thus it should probably only advertise it for these :-)
Though it wouldn't be too hard to implement it for BookE using DBCR0:BRT
(Branch Taken debug event) though it might need some careful fixups such
as the one we have
McGrath rol...@redhat.com
Date: Thu, 1 May 2008 23:40:58 -0700
Subject: [PATCH] powerpc ptrace block-step
This adds block-step support on powerpc,
including a PTRACE_SINGLEBLOCK request for ptrace.
Signed-off-by: Roland McGrath rol...@redhat.com
---
arch/powerpc/include/asm/ptrace.h |4
Roland et al., has there been any recent report on
regset/tracehook-on-arm porting?
I haven't heard anything. There are no difficulties in that port AFAIK.
If an ARM arch maintainer (or someone who wants to send them patches) wants
to do it, I'm happy to give advice.
Thanks,
Roland
I do have a really large objection of merging the current messy double
ptrace implementation. If current utrace based ptrace isn't 100% ready
there's absolutely no point in merging it.
There is no current utrace-ptrace implementation. I haven't proposed
one for merging. When one is ready
The bug is in your module's unconditional use of UTRACE_*STEP.
It's documented that it's invalid to use them unless arch_has_*_step()
has returned true. (The utrace_resume_action description refers you
to utrace_control(), where this is documented.)
It's a bug to use these at all when the
I think you misunderstood my point. I never advocated the wholesale,
unconditional rewriting of ptrace. A gradual approach there seems a
must - and your approach of CONFIG_UTRACE_PTRACE seems like the way
to go, initially.
Ok, good. I was confused by your focus on the diffstat and your
This kind of info should have been 1) emitted a month ago, in the
middle of the development window, 2) have been part of the
submission ('why do we want it' 'what will be the future benefit?').
Well, we are where we are. I don't really know what kind of lack
you see in having said what its
Here is a trivial module to implement the seccomp guts via utrace. I
haven't tested it at all. (AFAIK it was only ever used by cpushare,
and that project might be defunct now.)
I'm not sure what Ingo had in mind for integrating this. If it's just
to reimplement the existing prctl interface,
Regarding ptrace-via-utrace. What is the plan there? Am i looking
the right branch:
| earth4:~/linux.trees.git git diff --stat
| linus/master..utrace/utrace-ptrace kernel/ptrace.c arch/x86/kernel/ptrace.c
| kernel/ptrace.c | 803
++-
Well I dunno. You guys are closer to this than I am, but I'd have thought
that systemtap is the main game here, and most/all of the above is just
fluff.
That is certainly not true for me. It is true that I hear plenty from
systemtap developers, users, and boosters wanting utrace to be
kernel/utrace.c should probably be introduced as
kernel/trace/utrace.c not kernel/utrace.c. It also overlaps pending
work in the tracing tree and cooperation would be nice and desired.
Of course I would like to cooperate with everyone. And of course it does
not really matter much where a
utrace is a new kernel-side API for kernel modules, intended to make it
tractable to work on novel ways to trace and debug user-mode tasks.
These patches apply to the current Linus tree (v2.6.29-rc8-241-g65c2449).
The first two should apply fine on the -tip tree as well, and we will be
glad to
Roland, I think it better to change tracehook definition more, please
see below.
I don't really object to this in principle. But this touches signal.c a
lot more in less obviously-trivial ways than my tracehook patch. That is
more of an issue at the outset than some extra fiddling in the
Now we send SIGCHLD with si_code = CLD_STOPPED twice. By T1 from
do_signal_stop(), and by T1 or T2 from get_signal_to_deliver() which
checks SIGNAL_CLD_MASK.
Yes, I considered this problem. It's just not so big a deal as to worry
overmuch about this corner case in the first version. What
201 - 300 of 378 matches
Mail list logo