We've mentioned global tracing.  I think it's time now to discuss it
thoroughly and decide what we do or don't want to do.

1. So, what is global tracing?

It's an interface to trace the events that a utrace engine can trace,
but generically across the whole the system without attaching to
specific threads.


2. Why do we want utrace global tracing?

(I won't go into what the ability to trace things is good for in the
abstract, I assume we're all sold on that.)  This has been an item on
the utrace TODO list for a long time, since before we had any other
plan for system-wide hooks in the kernel.  Now we have tracepoints and
markers (et al).  

So the question here is, why do we want to do this in utrace?  In each
place that utrace has a tracing hook (now all in <linux/tracehook.h>),
you could easily add a tracepoint/marker.  So what does utrace global
tracing offer over using tracepoints?

Here are my thoughts on this.  I'm not 100% sold that these justify it.
There is a clear argument not to add another feature that provides a
second way to do what you can already do with tracepoints.

a. Event vocabulary clearly aligned with utrace events.

   The identifiers for and details of all the places you can get events
   and what information is on hand match the per-task utrace interface.
   This makes it very straightforward to compose higher-level interfaces
   that describe events uniformly, whether they are tracked via the
   global or per-task mechanism.

   This is quite a weak argument.  It would never be difficult to map
   the two different mechanisms to a uniform higher-level event vocabulary.

b. Coordinated with per-task utrace callbacks.

   If system-wide hooks are an independent mechanism, it won't be
   obvious (or necessarily stay reliable) whether the tracepoint is
   before or after the utrace callbacks, etc.  

   As part of a unified interface, that will be well-specified.  (If we
   grow some complex callback order priority feature, the global hooks
   might have detailed options for where to land in the ordering with
   various per-task callbacks.)  Moreover, it's natural for a global
   tracing callback to get informed directly about what other utrace
   engines are doing.  e.g., a system-wide catch-all hook for debugging
   stray crashes can tell if an active debugger is doing something to
   the particular task and get out of the way.

c. Callbacks can change outcomes.

   In utrace, the syscall and signal callbacks can affect what the task
   actually does in a well-specified way.  Tracepoints just report events.

   For syscalls, off hand I can only see wanting this for fault injection.
   There might be other sensible uses.

   For signals, this might be crucial to doing the "crash-catcher of
   last resort" sort of thing (at least, to do it more efficiently than
   giving every task in the system a utrace engine just for that).  What
   I'd expect this to do is catch SIGNAL_CORE with a global tracing
   callback that attaches a new per-task engine, ignores and pushes back
   the signal (like crash-suspend does), and the new engine UTRACE_STOPs
   until some user-level crash handling stuff wakes up and takes over.

d. Kernel already has checks here, so "almost free".

   The utrace event hooks are at places where the kernel has had old
   ptrace checks forever.  The old code has fast paths that do:

        if (current->ptrace & mask) slow path;

   Now in those same places there is:

        if (current->utrace_flags & mask) slow path;

   So the cost of the checks is identical to what's already there.  This
   is the main thing I've expected to soothe the upstream performance
   nit-pickers about utrace: zero new overhead if you ain't usin' it.

   For global tracing, those checks would be:

        if ((current->utrace_flags | utrace_global_flags) & mask) slow path;

   The cost is now two or three instructions with one load.  It would
   increase to four or five instructions with two loads.  By and large,
   these checks are already in places that take a lot of locks and so
   forth, so this addition seems pretty tiny.  It's certainly no worse
   than adding a marker (in the current markers implementation), and
   probably usually far better, since it combines with the existing
   utrace check.


3. What would it look like?

Global tracing would use the same struct utrace_engine_ops, sharing all
the same signatures for the callbacks.  There would be a call to
register a global tracing engine, which would give you an engine
represented by the same struct utrace_attached_engine type (so this
pointer is passed to your callbacks).

All the calls to administer global tracing engines would be separate
from the existing per-task utrace calls, though we overload the same
types and use the same callbacks.  Perhaps only register/unregister
calls, though maybe also a set_events to change your event mask after
the fact.  I'm leaving aside the asynchronous detach details for now.

Callbacks would be the same, except that the utrace_resume_action bits
in the return value are ignored completely.  The action argument to
callbacks tells you what any attached per-task engines are doing.  For
the callbacks with other return value bits (signal and syscall), those
other bits do affect the task.

Global engines' callbacks all run after all per-task engine callbacks.
(This could change in future.)

A global engine cannot use UTRACE_EVENT(QUIESCE), and never gets any
report_quiesce callbacks.  The QUIESCE event is not a natural event in
the traced thread's life.  All it would be is an indicator that some
other utrace engine was causing this thread to take a slow path.

I had originally planned to rule out SYSCALL events for global tracing.
The reason is that this is not like other event checks where a simple
flag gets checked cheaply.  Instead, it requires setting the low-level
TIF_SYSCALL_TRACE on a thread, which makes it take a far slower path on
system call entry and exit, and has a big impact on performance just
from that alone.  Global tracing has to set this individually on every
thread, and then pay that big overhead across the board.

But, there appears to be strong interest in having global syscall
tracing, and knowingly paying that overhead when you turn it on.  It is
easy enough to support that.  (This is something else that is naturally
integrated into the utrace implementation very easily, but tracepoints
alone can't do without some more special hackery.)

I'd kind of prefer to exclude REAP events for global tracing.  For
implementation reasons, it's really not any better than just having a
separate tracepoint/marker on release_task.  But it wouldn't cost hardly
any more to roll it in, so I could be talked into it just for uniformity.


4. So, what's the plan?

I need folks who might use global tracing to answer these questions:

   a. Do we want it?
   b. Do we want it right now?
   c. What justifies doing it in utrace (vs leaving it purely to
      tracepoints et al), to placate upstream critics?

Please don't say, "That would be nice; your reasons sound good."
That just does not help at all.  The reasons in #2 above are ones I can
think of, but I'm not arguing for them or for the feature.  If you want
the feature, *you* will be justifying it to the upstream critics.  Let's
here be as skeptical about adding the new complexity, before we decide on
doing it, as our unsympathetic reviewers will be.


Thanks,
Roland

Reply via email to