The utrace API itself is not a good fit for global tracing, since its
purpose is tracing and control of individual user threads. There is
no reason to allocate its per-task data structures when you are going
to treat all tasks the same anyway. The points that I think are being
missed are about
I was wrong, I forgot that tracehook_get_signal() doesn't need JCTL.
Right, that is key.
OK, let's look at utrace_do_stop:
if (task_is_stopped(target)
!(target-utrace_flags UTRACE_EVENT(JCTL))) {
utrace-stopped = 1;
return true;
}
utrace_set_events:
(utrace-death ((old_flags ~events) DEATH_EVENTS))
(old_flags ~events) DEATH_EVENTS) means the caller tries to
clear DEATH/QUIESCE. Why this is not allowed? And why this is not
allowed _only_ when the target runs utrace_report_death()-REPORT()?
This is
When a report function of an engine returns UTRACE_STOP, it means (may mean)
that it wants to change the status of the process before resuming it.
VM monitors often change the status, sometimes debugger users want to set
some variables too.
Yes. In ideal cases, it can decide up front quickly
I'd like to ask you to clarify what utrace-stopped means...
I'm very glad you are looking into this area!
My understanding is: if we see -stopped == true under utrace-lock, then
the target can do nothing interesting from the utrace's pov. The target
should take utrace-lock at least once.
Yep. And utrace_reset() can be called because -stopped == 1.
Right.
Let me explain. Again, let's suppose D attaches engine E to the target T.
T enters utrace_report_jctl() with -stopped == 1.
D calls utrace_set_events(events = 0), this removes JCTL from E-flags.
D calls, say,
Hmm. But this leads to another question: why does utrace_reset() set
UTRACE_EVENT(REAP) ?
This looks as: make sure -utrace_flags is never 0 unless we detach
all engines. Perhaps because sometimes, say tracehook_notify_resume(),
we just check task_utrace_flags() != 0 ?
Right, it's an
http://web.elastic.org/~fche/git/linux-2.6-utrace.git utrace-ftrace
Frank Ch. Eigler (1):
utrace-based ftrace process engine, v2
Thanks, Frank. Your branch is now in my repo and its patch generated in
2.6-current/. I'll pull periodically, or let me know if my repo lags
behind yours
There is at least one change from the earlier behaviour -- rather than
utrace_attach_task() retrying by itself on a !parent attach, -EAGAIN is
returned to the user. That may need changes to the utrace client side.
Oops, that was not intentional. I've restored the old behavior.
I've just
utrace_attach_task() checks -exit_state == EXIT_DEAD. Why? I mean,
how can it help, we don't hold any locks, target can change its
-exit_state right after the check.
Good catch, thanks. This is a remnant of the utrace-indirect code,
where utrace_first_engine() had an interlock with
what about get_utrace_lock() ? Do we really need the EXI_DEAD check?
And this check looks racy too.
It is not strictly necessary any more, no. It now serves as an early
unsynchronized check before taking the utrace lock, rather than as a
reliable interlock. The same is now true of the check
i've been looking at the patch at the utrace.patch at:
http://people.redhat.com/roland/utrace/2.6-current/
hopefully, that's the latest one.
Yes, it's updated frequently. The .id files tell you what git commit the
patch corresponds to, so we can be mutually clear in making references.
Ok, applied. I thought I'd seen that checking style in some other kref
user and was copying its style (which is admittedly a dubious thing, since
the free really has already happened), but I can't now find what I might
have been thinking of.
Thanks,
Roland
I would rather not touch the tracehook interfaces now. You are indeed
right that the motivation for this had to do with the utrace-indirect code.
As I've said, I do intend to resurrect that code and send it upstream later
on. We can consider cleanups then. For now, let's not do anything
I've added a change (only in git) so that with CONFIG_UTRACE=y and
CONFIG_UTRACE_PTRACE=n, ptrace and utrace are mutually exclusive on each
task. The utrace_attach or PTRACE_ATTACH call fails with a characteristic
EBUSY so that the failure looks new and unusual in an obvious way.
It would be
There is one regression for PTRACE_SINGLESTEP on 2.6.28-rc7/8 + utrace.
Thanks for looking into this.
I made some analysis and found something. The main problem is in
ptrace_resume (kernel/ptrace.c).
For ptrace(PTRACE_SINGLESTEP, child, 0, 0), we ran into
ptrace_resume. Then
action:
The current implementation is that if I create a new engine in response
to an exec (when called from some other engine's report_exec callback),
and set that engine's flags to be notified of execs, the new engine gets
notified of the exec that's already underway. This turns out to be
rather
Here is a brain dump about utrace extension events. This is an idea I've
had more or less all along. It underpins some of the higher level ideas
I'll post in other brain dumps. So I'll just spew details on the concept,
so that it'll make sense when I talk about other things as if this existed
So, what are extension events good for?
They have the desireable feature of signals: you can post one from almost
anywhere in the kernel, and the event gets processed at the safe place just
before returning to user mode (where you hold no kernel locks, can safely
use user_regset, etc.).
But
The current patch has regressions on block-step, step-jump-cont,
step-jump-cont-strict, and step-to-breakpoint. Do your tests cover
anything new that is not already tested by one or more of those?
Thanks,
Roland
Can we get these two new people from Redhat posting to this list as they
make progress on utrace? There is nothing visible from outside, really.
The list has been quite for 1.5 months, if you don't count the annoying
spam.
The first half of that time I was on vacation, and the second half
Thanks for the report, Wenji. Please work with Denys to make sure these
problems are represented in the ptrace-tests suite and update the
utrace/tests wiki page. Note the current code has some existing
regressions in ptrace-tests. It's possible what your tests check is
already being checked by
I've been working for a while on several bugs that are regressions of
ptrace behavior in the latest utrace branch code. These are all now
represented in the ptrace-tests suite; Denys Vlasenko [EMAIL PROTECTED]
has been investigating the problems and adding test cases. (Thanks, Denys!)
Already noticed the same thing and checked it in. Thanks for the report.
Thanks,
Roland
Ack! That was a stupid braino on my part, sigh.
I'll post a complete version that also fixes the set side.
Thanks,
Roland
Hi, David. I'm back in the saddle at home today and got a chance to look
into this bug.
I've fixed it in the current utrace git/patch. I also committed it for the
next rawhide kernel. I think rawhide mirrors are in f10-beta freeze, so
you have to go to
Off hand I don't see anything wrong with your test module.
But I'll have to test it out myself, and I won't be able
to do that until I'm back from travelling (next week).
Thanks,
Roland
clean. The only thing I did do was invent some alternative simpler
helper functions rather than using the user_regset_copy* functions (to
avoid taking the address of function arguments, which needlessly forces
them onto the stack.)
Ideally those would get inlined so that doesn't happen.
It was allocating too much memory, which is harmless.
Didn't it also write NT_PRFPREG notes of the wrong size?
Thanks,
Roland
Review the rawhide kernel-doc. Found there are several small questions
those may be updated next round.
Thanks!
1. Redundant returns in Using utrace_barrier
Otherwise returns it waits until the thread is definitely not in the
midst of a callback to this engine and then returns zero
It so
In any event callback, the event task passed as an argument is the task
that's making the call. So you are making those utrace_* calls on current.
utrace_barrier checks for target==current and does a short-circuit return.
So it's a harmless no-op that always returns zero. But it really doesn't
That's fine by me. I was thinking of the definition of errorness as
arch-dependent magic, and just took the constant used in x86-specific
userland code that does this. If IS_ERR_VALUE is always going to be what's
right on x86, by all means use it.
Thanks,
Roland
A little fnc to get the task associated with a given pid:
static struct task_struct *
get_task (long utraced_pid)
{
struct task_struct * task;
read_lock (tasklist_lock);
task = find_task_by_vpid(utraced_pid);
if (task) get_task_struct(task);
Thanks! I don't think that ever hurts anything, but it's definitely right
to fix it. I've put the change in and updated the patches.
Thanks,
Roland
After its various recent troubles, fits, and starts, the Fedora rawhide
kernels are being built and published frequently again. I thought I'd
mention that Rawhide's kernel-doc package is a convenient way to get the
utrace documentation (and all the kernel DocBook/kerneldoc stuff) in
formatted
Thanks for the report, Ananth.
Ah! The i386 will enter do_notify_resume() with interrupts disabled.
Other machines don't do this (x86-64 and powerpc64, anyway). It is often
harmless, because if TIF_SIGPENDING is set, we'll first enter
get_signal_to_deliver() and implicitly reenable interrupts
Please always mention the kernel version and machine you are using when
asking any question like this. The output of uname -a is a good, concise
way to include everything I'll need to see. When reporting about a 64-bit
kernel, it is also crucial to mention whether the user binaries in question
Good catch! Fixed.
Thanks,
Roland
I've finished the updates I've been talking about. Please take a fresh
look at the 'make htmldocs' book. I updated crash-suspend.c for the changes.
This utrace should appear in a Fedora rawhide kernel some day soon.
Thanks,
Roland
In the latest upstream kernels, detach-stopped is the only ptrace-tests
case failing. A fix I tried for that worked, but made attach-wait-on-stopped
start failing instead.
Can you tell me if you think the expectation in attach-wait-on-stopped
really seems correct? It seems to be contrary to
It will never be in the fast path. It will always require
TIF_SYSCALL_TRACE to bet set on each thread, which means the slow path.
[...]
OK, I must have misunderstood your original posting:
# [...]
# d. Kernel already has checks here, so almost free.
This refers to all the other
Actually, this point is where I'm stuck on these weeks.
If we add marker or tracepoint to trace every syscalls,
we might have to put it in the tracehook or audit and set
TIF_SYSCALL_TRACE for every process, or put tracepoint
in the syscall entrance/exit asm-code and check another
flag.
* Create another global variable utrace_possible_flags. Each bit
is set only if there is either a global tracer for the event,
or at least one tracer in the system (keep a global counter).
* Always check utrace_possible_flags first, and if it is set
(thus requesting the slow
My initial opinion was that you were moving away from RCU to also rid
the task_struct-utrace assertion failure, which IIRC from some of the
investigations at the time, were mostly for RCU lifetime reasons.
That assertion failure (BUG_ON) was due to an internal bug.
It was entirely for buggy
Answer to (a) is surely yes, but...
Since you're sure, what would you say to convince a skeptic?
... wouldn't it be better to first push the base utrace upstream and add
this as a feature thereafter?
I think this is probably how it will go anyway. I want to get a plan on
the table now. The
This kind of interface would be nice to have in utrace only if it were
significantly cheaper than doing what we do now: potentially attaching
utrace-engines to each thread -- or (in the near future, systemtap
bug# 6445) to subtrees of the process hierarchy.
The overhead (memory +
Sorry, there is no (recent) GIT history. I announced here when I started
the new GIT repo that I would be using git-rebase. I'm still doing that.
This is what made most sense while doing constant rebasing of several patch
series preparing them for upstream submission. Now that tracehook et al
Yes!
[...]
What is the use case for a utrace client to do a utrace_engine_get/put()?
Wouldn't it be more robust if utrace implicitly handles refcounts as
you've detailed below?
If the only operations that affect this count are implicit, then I assume
you must mean those are attach and detach.
Something that's always been an issue in the utrace interface is the
management of struct utrace_attached_engine. It's tricky that you have to
use RCU and/or follow picayune rules in the attach+set_events-callbacks
sequence to be sure your engine pointer is valid. From the beginning, I
expected
2. Why do we want utrace global tracing?
From a systemtap point of view, we'd certainly use global tracing.
You're using tracepoints/markers too. (You'll use anything, you minx.)
What we need is reasons for this to be a utrace feature.
Global tracing would be *really* nice; your reasons
We've mentioned global tracing. I think it's time now to discuss it
thoroughly and decide what we do or don't want to do.
1. So, what is global tracing?
It's an interface to trace the events that a utrace engine can trace,
but generically across the whole the system without attaching to
So, the intention is that -EINPROGRESS can be returned only on a
utrace_set_events(task, engine, 0), right?
No, for any call when some bits had been enabled before and there hasn't
been an intervening safe point. My idea was that it would return 0 only
when it's sure that no disabled callback
In utrace code fetched today, cscope reports 16 lines with references to
obsolete UTRACE_ACTION_something symbols. They're all in comments or
#if 0 code, so they don't break the build, but those comments aren't
very helpful in their current state.
Thanks. I found 3, and fixed those.
Note
Thinking aloud, utrace_control(UTRACE_STOP) returns -EINPROGRESS for
threads not yet stopped
a. possibly still in userspace, yet to pass through a quiesce safe point
b. blocked in the kernel on a syscall or an exception.
Correct.
Would task_current_syscall() help here? On a -EINPROGRESS
Is there a way to avoid using UTRACE_INTERRUPT? Certainly I'd like to
avoid disturbing the processes we're tracing.
That's what the entire asynchronous detach discussion is about!
Wade on in with me and Ananth. The brain piranhas are biting!
Thanks,
Roland
Thanks for the report. This was a simple, stupid bug, affecting all the
error cases from ptrace_attach (and ptrace_traceme), not just attaching to 1.
New patches are up now.
Thanks,
Roland
Thanks, David. That is exactly the right example of using kernel
synchronization primitives with callbacks to implement blocking behaviors
you want. The wrinkle there is that you use UTRACE_INTERRUPT, which
(potentially) perturbs the behavior of every traced thread. Doing this
gives you a
We had a pleasant surprise over the weekend.
The tracehook branch was merged in upstream!
As of 2.6.26-git18, the generic tracehook patches plus the powerpc and
sparc64 arch work are all in. The x86-tracehook branch is in the
hands of the x86 arch maintainers and I expect it will get pushed up
To test out the new utrace interface (I was using kernel
2.6.26-138.fc10.x86_64), I ported the crash-suspend.c example from the
old interface to the new interface. It is attached to this email.
Thanks, David. I'd been meaning to get around to updating that example.
I'll take a stab at
I also reproduced likely same error as David in simple PTRACE test.
That does not look related to David's crash at all.
I will look into your case.
Thanks,
Roland
Fixed in the latest code.
Thanks,
Roland
I fixed the bug David found and also implemented UTRACE_SIGNAL_HOLD,
which I had forgotten about. I then updated crash-suspend.c to work
in the new interface, and it's even a bit less kludgey than the old one.
Now it works! (You'll need the very latest utrace patch.)
Note that recent Fedora gdb
Hi folks! Here's today's status of the utrace work.
The current trees fork from v2.6.26, which came out over the weekend.
Note there are 2.6.26/ directories of patches and GIT branches, but
those are not yet to be used. For the moment, the upstream tip is still
2.6.26 and so the tip utrace
Are all the bugs you've encountered related to single-step covered
in the ptrace-tests suite?
I have some more fixes in the x86 bowels about ready to send upstream.
From the status quo upstream, my changes get FAIL-PASS for
step-jump-cont-strict (32 64), step-through-sigret (32).
There are no
Here is a vague start at the directions I have in mind for a user-level
interface that I've been calling ntrace. It's not a real specific plan
at the literal interface level. It's an overview of what the components
are that the bit-level definition of the user-level interface sits atop.
I'll
So, it looks like a 2.6.25 i686 problem.
Indeed so. Sorry about that, folks. As I said at the time, the 2.6.25
rebase was quick and dirty and this regression slipped through. It is
only the i386 kernel, not x86_64. I'll get in fixed in Fedora soon.
Thanks,
Roland
Sorry for the delay. I was travelling last week.
When I tried to do a make allmodconfig or make allyesconfig
from utrace kernel tree cloned from
git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git
Sorry, I left the old branches there though they are quite stale
But what should happen if I do PTRACE_POKEUSER (MSR_SE = 1)?
(1) Should it get cleared on the next PTRACE_CONT or PTRACE_SINGLESTEP?
or
(2) Should PTRACE_CONT start behaving the same way as PTRACE_SINGLESTEP until
someone does PTRACE_POKEUSER (MSR_SE = 0)?
We are discussing this upstream
you do one PTRACE_SYSCALL + WAITPID and then PEEKUSER %rax getting -ENOSYS.
-ENOSYS is a syscall return value which should be returned after the _second_
PTRACE_SYSCALL - on the syscall exit, not on the syscall entry, shouldn't be?
I would expect PEEKUSER %rax should still return -23 on first
Check that you got the latest patch. I fixed powerpc last night.
Or to stay up to the minute during this week, use git.
Thanks,
Roland
Thanks for the testing. Note that most of the crasher tests are for
intermittent race bugs that might take a long time to show. They look for
TESTTIME environment variable to set how long to keep trying, and the
defaults are pretty short. You should run those for long periods to
achieve
Paul Mackerras has pulled in the regset changes. Should be in Linus'
tree tomorrow.
Indeed, it is in now.
Thanks,
Roland
Please do not use HTML mail.
Yes, that's nothing we don't know.
Thanks,
Roland
Thanks for the report. I have fixed those nits in the 2.6-current, 2.6.23,
and 2.6.18 patch sets. However, please note that the arch code in that set
(mostly what's in the regset patches) is now more or less dead. The
user_regset work now upstream in -mm has replaced this part of the utrace
When I applied it to 2.6.24-rc3 kernel, got some report. Most of them
are about whitespace.
You shouldn't have whitespace problems.
My patch-making scripts try to make sure of that.
But life is too short to worry any more about that. Just use patch -l.
But one of them made compilation
Count me, and most/all of the SystemTap team, among the folks who would
like to see utrace accepted into Linus's kernel. Upstream acceptance
seems to be bogged down on the following points:
1) On some architectures, ptrace still hasn't been successfully adapted
to use utrace.
The arch issue
Inside the kernel, you can use utrace for this. At the moment, there is no
new canonical user-level interface that hooks into the utrace facilities.
So if you are looking for a replacement for ptrace in that sense, the
utrace layer per se does not fulfill that role by itself.
If you already have
I'm very sorry for taking so long to reply to your message.
The 2.6-current/ patches are usually meant to apply to the daily Linus
current GIT tree, though at the moment they are actually still based on 2.6.23.
Those patches cannot be expected to apply to older kernel versions.
The
301 - 378 of 378 matches
Mail list logo