Re: [Spice-devel] RFC: Lightweight tracing mechanism

Christophe de Dinechin Tue, 06 Jun 2017 10:15:04 -0700

> On 6 Jun 2017, at 16:29, Marc-André Lureau <marcandre.lur...@redhat.com> 
> wrote:
> 
> Hi
> 
> ----- Original Message -----
>> Answering two emails from Frediano and Marc-André that were asking closely
>> related questions, to avoid duplication…
>> 
>>> On 6 Jun 2017, at 13:28, Frediano Ziglio <fzig...@redhat.com> wrote:
>>> 
>>> Would be helpful if you had recap Pavel problems. I don't remember.
>> 
>> I did not find it in email, so it was probably IRC. I remember something
>> about having to use --spice-debug for some debug messages, and glib
>> environment variables for others. Pavel, do you remember something like
>> that?
>> 
>>>> Would you find this useful? The most difficult part being to connect that
>>>> to existing tracing mechanisms to avoid replication of traces, while at
>>>> the same time ensuring that the existing enabling mechanisms work as
>>>> before.
>> 
>>> The main questions are:
>>> - what are the original problems exactly?
>> 
>> Getting focused debug output, which to me means “conditional, cheap, always
>> there". I don’t like having an “all-or-nothing” debug proposition for Spice.
>> Currently, we have SPICE_DEBUG=1 or SPICE_DEBUG=0. That’s too binary. I
>> would like SPICE_TRACES=channel:memory, for example.
>> 
>> 
>>> - why your solution is good in your opinion?
>> 
>> 1. Conditional tracing. Except by misusing domains to identify conditions, I
>> think it’s not easy to do with glib. I may be wrong, but did not find
>> anything here for example:
>> https://developer.gnome.org/glib/stable/glib-Message-Logging.html.
> 
> Personally, grep conditions covers 99% of my needs.

Not mine. If I’m investigating where some latency comes from, the last thing I 
want to is to have all-encompassing printouts that would themselves be a 
primary cause of latency.

> 
>> 
>> 2. Lightweight enough tracing that you can leave it there all the time. In
>> this implementation, adding a trace consumes one bit in memory, testing for
>> it is one memory access (plus the conditional jump, I know ;-). By contrast,
>> the cost of logging in glib is quite high, which makes little sense for
>> low-level debugging messages that are intended to be thrown away under
>> normal use.
>> 
> 
> Well, glib logging isn't meant for tracing. So it isn't used for that, and it 
> has no performance issue for regular debugging/logging.

Yes. But what I’m presently interested in are performance issues. And the 
messages I need are presently not in spice, nor is there a sufficient 
infrastructure to add them. Definitely, glib does not do what I want, see below 
for more details.

> 
>> 3. Scalable, and therefore as fine-grained as we need it to be. With a
>> similar mechanism, Tao-3D has about 70 traces (see
>> https://github.com/c3d/tao-3D/blob/master/tao/traces.tbl 
>> <https://github.com/c3d/tao-3D/blob/master/tao/traces.tbl> for the graphic
>> side traces), including in highly performance-sensitive paths.
> 
> I think we have few highly performance-sensitive paths in our code, and 
> before introducing facilities for tracing it, it would be interesting to know 
> what you want to have traced.

Let me take some examples straight from Tao3D, which I dearly miss in spice 
today (with actual output from Tao3D for the first two):

1. fps: Report frames per second.

Time;PageNum;FPS;Exec;MaxExec;Draw;MaxDraw;GC;MaxGC;Select;MaxSelect;GCWait;MaxGCWait
1496761089.058;1;0.000;-1;0;-1;0;-1;0;-1;0;-1;0
1496761093.755;1;13.800;50;3217;9;482;1;13;0;0;0;9
1496761094.756;1;55.400;13;3217;3;482;0;13;0;0;0;12
1496761095.759;1;94.200;8;3217;2;482;0;13;0;0;0;12
1496761096.759;1;136.400;6;3217;2;482;0;13;0;0;0;12

2. memory: Report memory usage on a regular basis (ideally, component by 
component, e.g. separate codecs from spice code)

           NAME    TOTAL    ALLOC    FREED
     N2XL4TreeE        0        0        0
  N2XL7IntegerE     6132     1527      224
     N2XL4RealE     4088      215     3478
     N2XL4TextE     6132     1217      336
     N2XL4NameE    40880    15664        0
    N2XL5InfixE    33726    10246      332
   N2XL6PrefixE    12264     3177        0
  N2XL7PostfixE     1022      254        0
    N2XL5BlockE     5110     1017      112
  N2XL7ContextE     1022       15        0
  N2XL7SymbolsE     4088     1432        0
  N2XL7RewriteE     9198     5519        0
          =====    =====    =====    =====
      Kilobytes    9820K    3293K     223K
[GCThread] Collect time 3 ms
[GCThread] GC collect started
[GCThread] Collect time 0 ms
[GCThread] GC collect started

3. bandwidth: Report network bandwidth usage (incoming and outcoming)

4. latency: Report network observed latency

> My feeling is that those performance traces don't need to be in public or 
> release code, since they are usually very specific to the problem you are 
> studying at hand, and don't need to remain checked in code.

My experience tells me the exact opposite. If you have a structure like this in 
place, you start putting a bit more investment in your traces, so that they can 
be reused later.

> 
>> 4. Can be used to conditionally enable / disable code (other than tracing).
>> For example, Frediano started chasing memory leaks. Let’s say he develops a
>> nice way to track leaks. It’s an investment, we want to keep it. With this
>> infrastructure, we could put that under -t leakcheck, and run that if we
>> have a doubt about a memory leak.
> 
> Does tracing really helps with memory leak? when you have asan and valgrind, 
> really, no.

Valgrind? Running a heavily modified executable at a quarter of the speed on a 
good moon? And clearly, based on the massive memory leak I reported last week, 
such tools are not used on a regular basis.

The purpose of something like -t memory is to give you a broad overview. Be 
able to say “we use 3G of memory, 1.5G of which is for graphics, 1G of which 
for cached images”. So that you know where to focus your effort. Using valgrind 
to give you that information? Tell me how.

> 
>> 5. Can be used for non-boolean values. It makes the code only marginally more
>> complicated to add “tweaks”, i.e. integer values you can configure at
>> run-time easily. Currently, you’d typically use environment variables or a
>> configuration file for this, which expensive and not convenient. With this
>> approach, add one tweak, say “max_mem”, and tweak with -t max_mem=100 to,
>> say, limit allocations to 100M and detect leaks more easily.
> 
> What has this to do with tracing?

In both cases, it’s runtime configuration intended primarily for developers.

To better see the continuity between the two, imagine that we have a trace 
focusing on JPEG encoding. But now, if we focus on understanding JPEG quality, 
I can do something like -t jpeg -t jpeg_quality=50. So it’s really part of the 
same kind of usage patterns.

> 
>> 6. Self-documented. Using spicy -t list gives you a list of possible tweaks.
>> And all messages come with a trace name prefix, so if you later want to get
>> that specific message, you know what to pass to -t.
>> 
> 
> I suppose it lists the various domains? or all trace points?

The domains if you want to call them that. But I’d rather leave “domain” for 
glib domains, and traces / tweaks for what I’m proposing here. It’s different 
concepts, much like domains and command-line options are different, even if you 
may use a domain name as a grep argument.

> 
> For the later, it's probably hard to keep it stable and without associated 
> documentation, it isn't of much help imho.

This is exactly why in this implementation, the macros in spice-traces.def 
mandate an “info” field for self-documentation.

> 
> In any case, I don't think it make sense to provide this in a user command 
> line. Using environment variable allows to tweak any program using your 
> library instead, and may be changed at run time.

The SPICE_TRACE environment variable can be used to configure that if that’s 
what you prefer. I personally prefer a short command-line option.

> 
>> I don’t think glib does any of these 6 points. I may be wrong, and I am
>> willing to use glib instead where it provides a sufficiently good solution.
>> 
>> 
>>> - did you try other options (like Marc-andre' suggested) ?
>> 
>> Yes. Christophe F already suggested them:
>> 
>>>> On 23 May 2017, at 11:51, Christophe Fergeau <cferg...@redhat.com> wrote:
>>> 
>>>> I think we have SPICE_DEBUG already which might make
>>>> sense, another alternative would be to have a --enable-alignment-debug,
>>>> or something like glib (SPICE_XXX_DEBUG=alignment:foo:bar)
>>>> 
>> 
>> I did not find where the existing code does anything that would correspond to
>> this
>> SPICE_XXX_DEBUG=alignment:foo:bar (I did ask). Any pointer? In any case,
>> what I saw in glib did not respond to my needs.
>> 
>> 
>>> 
>>> Reading the code is not clear where exactly you implemented the different
>>> domains
>>> suggested by the bug report.
>> 
>> I didn’t. I propose that we do something finer-grained.
>> 
>> 
>>> Did you just coded SSL as an example?
>> 
>> Yes. I don’t want to convert everything if I sense that the team does not
>> like it at all.
>> So I only did two in common (ssl and rect) and one in gtk (channel, by
>> modifying CHANNEL_DEBUG)
>> 
>>> Also you introduced a new "TRACE" logging level; why DEBUG was not enough?
>> 
>> This is transitional, to be able to distinguish preserve existing behavior
>> for existing debug code
>> (always go to the glib log) while TRACE code can also go to standard error.
>> 
>> 
>> 
>>> On 6 Jun 2017, at 12:54, Marc-André Lureau <marcandre.lur...@redhat.com>
>>> wrote:
>>> 
>>> I would rather stick to glib structured logging (and have compatibility
>>> code for glib < 2.50),
>> 
>> This does not address my problem. The structured logging is “how to log”
>> (i.e. it is a replacement for “printf”). My patch set is intended to define
> 
> "structured logging" is not about "how" or "where" to log. It is about 
> "structured" logs, that is having named fields & value that you can later 
> filter/query. One way is through the journal, but you can implement other 
> handlers.

I stand by my comment that it’s about how and where to log things. How: using 
named fields instead of pure ASCII text. Where: in a shared log, that can be 
merged between applications, with the benefits you listed, namely that it can 
be queried.

>  I don't know if your solution could fit. If it does, it would be great to 
> propose it to glib instead.

I could probably make conditional tracing part of glib, but only with a much 
more expensive implementation. In what I proposed, each flag is a bit in a 
structure. So when you write IFTRACE(foo), it translates to nothing more than 
if (spice_traces.foo). This relies on the compiler and linker doing the lookup 
job at compile-time. A library implementation would have to do a run-time 
lookup of the trace name, making it much more expensive.

Alternatively, a header-only implementation is feasible, but that makes it much 
more difficult to statically define where flags are stored. For instance, say 
glib started using it, then glib would for example need a “glib” set of flags, 
and a “spicy” set of flags. Let’s say two structures. An implementation of 
IFTRACE(flag) would have to somehow remember if “flag” was defined in “glib” or 
in “spicy”. So you end up with a much more complex implementation. At some 
point, we did something like that for XL vs Tao3D, but that used some C++ 
trickery that I don’t know how to replicate in C (using namespace lookup IIRC, 
I’d have to check the code).

Or you could forget about being smart, and just write something like 
IFTRACE(glib.flag) and IFTRACE(spicy.flag), with possibly appropriate macros. 
Hmm. Thinking aloud here. Why not…

> 
>> *what* to log and *when* (i.e. it is a special form of command-line driven
>> “if”). Ultimately, my implementation uses either stderr or glib old log or
>> both, but adding a structured-log option would be relatively easy (although
>> in a different patch series).
>> 
>> I don’t think glib has command-line driven conditional logging today, beyond
>> the debug, info, warning and error levels. Christophe F’s message (quoted
>> above) hints at such a facility, but I did not find it.
>> 
> 
> gtk+ has probably more debug and tracing needs than spice-gtk (and perhaps 
> spice-server), yet they seem to be fine with glib facilities for now.

At the moment, I care more about instrumenting Spice to understand its behavior.

But for what it’s worth, the current GTK implementation does not make any use 
of anything in glib to implement its conditional tracing code. What they have 
is a more naive version of what I did. It’s more naive because it’s not 
scalable (limited to 64 flags on a 64-bit system), and you have to hand-edit 
flags, options, etc. So it’s complicated to add a new trace. Here is for 
example what it takes to add a “snapshot” trace: 
https://github.com/GNOME/gtk/commit/def94f03e240babfd589c8ba8e7cc4961a3d6aab

Contrast with what it would take with my implementation: 
https://github.com/c3d/spice-common/commit/5b2919615c0f55f05a0e1ec6c1a5f238e97b00f8.

The combination of “not scalable” and “complicated to add a trace” probably 
explains why there are presently only 20 traces active in the whole code. It's 
also unnecessarily slow, since it goes through a loop iterating over displays 
in gtk_get_display_debug_flags() to find display-specific flags, a 
non-requirement for spice (and if it became a requirement, looping over all 
displays would probably the last thing on my mind to implement it).

> 
> fwiw, gtk+ uses a simple bitflag and a macro GDK_DEBUG_CHECK() / 
> GTK_DEBUG_CHECK() to provide conditional debugging information (only with 
> --enable-debug iirc).

Yes. Using a manual bitflag is what I called a “naive” implementation. My use 
of the pre-processor allows the code to generate option parsing to be generated 
automatically. So you change one file and only one file. And you don’t need to 
manually write things like 1<<19 when you add a trace.

> 
>> 
>>> or try to find a way to use qemu tracing facility (ftrace/lttng etc).
>> 
>> While it does offer some (ultra-complicated) way to do conditional logging,
> 
> but problably more powerful :)

Well, the kernel tracing infrastructure is intended to trace stuff inside a 
kernel. That a system using kernel facilities extensively like KVM/QEMU uses 
the kernel tracing facilities makes quite a bit of sense. That’s one case where 
you want to see events in the proper sequence between kernel and user-space 
operations. An argument could be made for tracing infrastructure in, say, the 
QXL driver. But for spicy, it does not make any sense to me.

> and more importantly commonly available, used and maintained.

For the kernel, yes. Do you know of any purely user-space program that uses 
ftrace for their own internal state?

> So there are probably more documentation available for those than a new 
> solution.

As I wrote earlier, that kernel infrastructure is much closer to the “recorder” 
I talked about earlier. The recorder is *much* simpler than the kernel trace 
API. Two headers (ring buffer and recorder), one C file, 1262 LOCs with 
comments. By comparison, last time I checked, the kernel tracing code was 51308 
lines of code, 8 headers, automatic code-generation tools, etc. And unlike the 
kernel tracing subsystem, the recorder is specifically designed to integrate 
non-intrusively in a user-space program.

There’s something to be said for simplicity. So far, I have integrated “my” 
recorder in Mesa, Spice or BTRFS in a couple of hours each time. There’s no way 
to do that with the kernel infrastructure IMO.

But again, this has absolutely nothing to do with conditional tracing code. 
This is yet another case a “where do I store stuff and how” (e.g. for the 
recorder in a ring buffer). This has nothing to do with enabling / disabling 
specific debug code (although, granted, the ftrace interface does have this 
capability, but in a way that only applies to the kernel, i.e. writing to /sys 
files, not command-line options or environment variables).

> Also, when tracing, you often need to consider what's happening outside of 
> your domain, so it's quite important that the tracing facility is used by 
> others.

Actually, the purpose of this kind of tracing is to offer domain-specific, 
focused tracing. So in the majority fo cases, you don’t need to consider what 
is happening outside of the domain, on the contrary, you really want it out of 
the way. And if/when you do care about it, then you get that data in your 
instrumentation, not by activating some tracing in another domain. For example, 
for Tao3D, number of polygons is important, but the -t polygons does not give 
you that information by activating some polygon tracing in the OpenGL domain. 
Instead, it collects statistics relevant to Tao3D from OpenGL, and presents 
them in a usable form.

> This is a great argument to keep using glog/glib, obviously for the gtk+ 
> client, but also for the server that uses more and more glib/gobject/gst etc. 
> ftrace/lttng is quite more low level and linux-specific, so probably not the 
> best userspace solution.

Agreed, ftrace is definitely not a solution to my problem, let alone the best 
one.

I can consider making a glib version of spice_trace(). But I think it will be 
unnecessarily complicated to get it accepted. And I don’t see what we’d gain 
from it.

> 
>> this is a kernel / server-side logging facility. I want to instrument spice
>> itself (client also, not just server). I think using this kind of
>> infrastructure could be done, but we are talking about something that, at
>> first sight, looks two orders of magnitude more complicated to write and to
>> use, only addresses some of my use-cases, and probably several times slower.
>> 
>> BTW, these tools also address post-mortem logging. I also have something in
>> the pipe, see https://github.com/c3d/spice-gtk/compare/master...c3d:recorder
>> and the associated submodule https://github.com/c3d/recorder/. But that’s
>> for a later RFC. If only Linux had control-T, you would know the joy of
>> hitting Control-T at your terminal window right after Spicy does something
>> funny, and getting a dump of what it just did, as if you had SPICE_DEBUG to
>> start with :-)
>> 
>> (that other patch series has a problem, that the recorder does not like
>> dynamic strings, so batch-converting existing messages is more challenging).
> 
> As you can see, I am quite reluctant to adding yet another logging/tracing 
> framework to our already (very) long list of tools/libraries. In fact, I am 
> very happy that we killed a dependency on log4cpp some time ago in favour of 
> glib (we were the only users in rhel at that time, and it was quite a pain to 
> use tbh :).

Well, we may have a long list of tools and libraries, but right now, prying 
even what I consider the most basic information out of Spice is a real pain 
IMO. If you have a nice grep for the use cases I listed above (e.g. FPS, 
network bandwidth, memory usage, etc), let me know. For now, to me, this is a 
problem that is not solved. And that I believe my proposal would solve 
relatively elegantly.

Also, while https://bugs.freedesktop.org/show_bug.cgi?id=91838 mentions a 
possible solution to the same class of problems that my proposal solves, I 
believe the solution presented here is much simpler and easier to implement. 
91838 has been opened since Sep 2015. My solution took me a couple of days to 
implement and test.

As for integrating the recorder… well, if there is an existing framework that 
does what I’m doing with it, then fine. I don’t know of any, and I looked.

Christophe

> 
> 
> thanks
> _______________________________________________
> Spice-devel mailing list
> Spice-devel@lists.freedesktop.org <mailto:Spice-devel@lists.freedesktop.org>
> https://lists.freedesktop.org/mailman/listinfo/spice-devel 
> <https://lists.freedesktop.org/mailman/listinfo/spice-devel>

_______________________________________________
Spice-devel mailing list
Spice-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/spice-devel

Re: [Spice-devel] RFC: Lightweight tracing mechanism

Reply via email to