Re: [Spice-devel] RFC: Lightweight tracing mechanism

Christophe de Dinechin Mon, 12 Jun 2017 00:47:24 -0700

Top remark: here is what I expected to see discussed when I shared the code.

1. What categories do we want?

2. How do spread the categorization work?

3. How much do today’s developers depend on existing output?

4. What categories do we request e.g. in bug reports (IOW, what should 
--spice-debug select)

5. What tracing options and tweaks are potentially dangerous for end-users 
(e.g. what could expose security information)

6. Consequently, what tracing options do hide in release builds, current POV 
ranging from “as many as possible" (me) to “none of them” (you)

7. What are the existing hard-coded parameters? Explicit (e.g. 400ms delay in 
the server) or implicit (e.g. unbounded memory usage)

8. Among these, which ones would we like to tweak?

9. Among those we want to tweak, which ones would deserve dedicated options or 
env variables, which ones would be perfectly OK with a more generic option like 
the -t I proposed.

It did not go as planned, but if you have time, I think these are interesting 
questions.

> On 9 Jun 2017, at 18:06, Marc-André Lureau <marcandre.lur...@redhat.com> 
> wrote:
> 
> 
> 
> ----- Original Message -----
> 
>> [drop_stats]id=0 #in-frames=99 #late=99 (100.00%, avg 2995.89 ms)
>> #drop-on-playback=0 (0.00%)
>> [drop_stats]id=0 #in-frames=198 #late=198 (100.00%, avg 2160.48 ms)
>> #drop-on-playback=3 (1.52%)
>> [drop_stats]id=0 #in-frames=297 #late=297 (100.00%, avg 1881.11 ms)
>> #drop-on-playback=12 (4.04%)
>> 
>> But then:
>> 
>> Why 5? How much I want to put a tweak there and see what happens if I change
>> that value.
>> 
>>      #define STREAM_PLAYBACK_SYNC_DROP_SEQ_LEN_LIMIT 5
>> 
>> Why 100? What happens if you change it?
>> 
>>        op->multi_media_time = mmtime + 100; /* workaround... */
>> 
> 
> I don't get your tweaks, they look specific to the problem at hand, I don't 
> think such esoteric code should be checked in code, it will just decay.

Are you telling me that you like this kind of magic, undocumented values  that 
you can’t change without recompiling? If that is really what you are saying, 
could you please explain the rationale to me? If not, what are you really 
saying?

Let me go on record stating that to me, most magic values like this in the code 
reek of “quick and dirty”, i.e. it’s cheap enough now, we believe it does not 
matter but never quite tested it.

> 
>> 
>> Here is a more serious follow up with another tweaks I just pushed,
>> latency_offset:
>> 
>> ddd@ptitpuce[traces] spice-gtk> spicy -t latency_offset=1900:drop_stats=25 -h
>> turbo -p 5901
>> [drop_stats]id=0 #in-frames=25 #late=0 (0.00%, avg 0.00 ms)
>> #drop-on-playback=0 (0.00%)
>> […]
>> [drop_stats]id=0 #in-frames=225 #late=0 (0.00%, avg 0.00 ms)
>> #drop-on-playback=14 (6.22%)
>> [drop_stats]id=0 #in-frames=250 #late=17 (6.80%, avg 2027.82 ms)
>> #drop-on-playback=15 (6.00%)
>> [drop_stats]id=0 #in-frames=275 #late=36 (13.09%, avg 2353.86 ms)
>> #drop-on-playback=26 (9.45%)
>> [drop_stats]id=0 #in-frames=300 #late=58 (19.33%, avg 2445.26 ms)
>> #drop-on-playback=50 (16.67%)
>> [drop_stats]id=0 #in-frames=325 #late=58 (17.85%, avg 2445.26 ms)
>> #drop-on-playback=50 (15.38%)
>> 
> 
> same idea, this is playing with the code, not code that should remain.

First, “playing with the code”, aka “interactivity”, is indeed what this is 
about. That what the point of my “punchcard” analogy. By being interactive, 
terminals made many things possible that don’t even seem to make sense with a 
punchcard reader and a line printer. And the mechanism I’m talking about there, 
as simple as it is, makes spice testing more interactive.

Second, whether that or any other instrumentation remains or not would depend 
on a number of things. If the instrumentation showed that something interesting 
happens with “latency_offset=1600”, then I believe I would share the 
instrumentation to let others replicate my experiment. If variations of that 
parameter allowed me to observe a 2x effect in some performance aspect, then I 
would certainly leave the tweak in and suggest we collectively work on 
optimizing its value.

> 
>>> 
>>>> 
>>>> 3. What is actually received during the early protocol exchange? That’s
>>>> mostly for my curiosity, to see how chatty the protocol is, the size of
>>>> packets it tends to exchange, etc. To answer a question from Marc André,
>>>> this one and the next are tools that would immediately have shown me where
>>>> the macOS version was goofing up.
>>>> 
>>>>> spicy -t wire_read -h turbo -p 5901
>>>> [wire_read]Read 16 bytes in 0x7f8b150d85a8
>>>> [wire_read]00000000:  52 45 44 51 02 00 00 00 02 00 00 00 BA 00 00 00
>>>> 
>>>> [wire_read]Read 186 bytes in 0x7f8b141c6650
>>>> [wire_read]00000000:  00 00 00 00 30 81 9F 30 0D 06 09 2A 86 48 86 F7
>>>> [wire_read]00000010:  0D 01 01 01 05 00 03 81 8D 00 30 81 89 02 81 81
>>>> [wire_read]00000020:  00 D2 A9 FE 7B F0 36 B7 33 3C C1 F0 5E 10 50 F3
>>>> [wire_read]00000030:  11 D3 3E 31 4A 95 AC AC 7B 0C 20 CB 57 B2 C8 1D
>>>> [wire_read]00000040:  10 F1 E2 4A 1A F2 8F 2A 03 70 25 BB 63 C7 D6 51
>>>> [wire_read]00000050:  D4 DA 64 3C 95 0C B6 9E A7 4E B3 3F 65 12 83 86
>>>> [wire_read]00000060:  F4 35 F3 AC 3D C3 9A 6C C9 EE 34 D9 67 F0 8A 2D
>>>> [wire_read]00000070:  53 4D 73 46 06 CA 80 04 CF 7F 72 4A 8D 74 81 78
>>>> [wire_read]00000080:  E8 28 39 51 7C 9E 3B C2 19 D6 BD EC B4 F7 0A 68
>>>> [wire_read]00000090:  D3 AF D7 1F E6 93 E1 4C 23 F5 83 68 E1 7D 31 9A
>>>> [wire_read]000000A0:  53 02 03 01 00 01 01 00 00 00 01 00 00 00 B2 00
>>>> [wire_read]000000B0:  00 00 0B 00 00 00 09 00 00 00
>>>> [wire_read]Read 4 bytes in 0x109478b44
>>>> [wire_read]00000000:  00 00 00 00
>>>> ** Message: main channel: opened
>>>> [wire_read]Read 6 bytes in 0x7f8b1274efb0
>>>> [wire_read]00000000:  67 00 20 00 00 00
>>>> [wire_read]Read 32 bytes in 0x7f8b1274dff0
>>>> [wire_read]00000000:  31 AE 1E 5E 01 00 00 00 03 00 00 00 02 00 00 00
>>>> [wire_read]00000010:  01 00 00 00 0A 00 00 00 9B AF BE 0B 00 00 00 01
>>>> 
>>> 
>>> This is so low-level than even debug log don't make much sense. Imho when
>>> you need to read the byte-stream, you can just attach gdb and put a
>>> breakpoint/command. Then GDB allows to do a lot more interesting stuff,
>>> like conditional breakpoint, various print etc..
>> 
>> Valid opinion. Don’t like trace options, don’t have to use them. By why
>> prevent me from using them?
> 
> Cluttering the code because you can't use proper tools is not acceptable 
> imho. 

I have to call the cheap shot here. What on earth makes you think it is OK to 
publicly imply that:

a) I can’t use these tools.

b) there is so little debate that these are the “proper tool” that my 
suggesting otherwise is "not acceptable"

Please refrain from making such unfounded attacks. I don’t mind you asking if I 
know about this or that tool, or why I think they are not the right tools. It 
is definitely unreasonable to claim I don’t know of alternatives, in particular 
of alternatives as widely known as gdb. And believe me, it is entirely wrong to 
deduce that I am proposing the trace options because I don’t know how to use 
gdb.

I have what I consider as very good reasons to consider the alternatives you 
suggested inferior for my purpose, and I made numerous attempts at explaining 
what these reasons were. I try again in this email, search below for 
“semantics” vs. “primitive”.

The claim of “clutter” is bogus too. Adding internal reports or instrumentation 
is not cluttering the code. If anything, it is documenting it. It is adding 
code you can use to check internal consistency, performance, or to test your 
hypotheses about how the code behaves under this or that load. Any large enough 
system has such tools. Linux filesystems have fsck for consistency checks, 
iostats, lsof, and plenty of other tools for analysis or for asking specific 
questions. Suggesting that these tools clutter Linux because you could use a 
kernel debugger to get the information is not a very compelling argument. How 
is spice different?

Finally, let me point out that I usually try to give precise and detailed 
answers to your questions. This often leads me to, in turn, ask precise and 
specific questions in my answers. I would very much appreciate if you started 
answering these questions. Otherwise, it is not a productive discussion, it is 
just you asserting your personal preferences and beliefs as the only possible 
choice.

> 
>> 
>> Please send me your GDB script for -t wire_read:marshall_read, which I use
>> for LEB128 development, and for -t drop_stats=25:latency_offset=N, which I
>> use to investigate where the latency is coming from. Please make it
>> compatible with LLDB too, unless you know how to install GDB on macOS (read
>> this first if you think I’m being silly
>> https://stackoverflow.com/questions/33162757/how-to-install-gdb-debugger-in-mac-osx-el-capitan).
> 
> Well that looks like a general problem if you can't use debuggers on your 
> platform. I don't own a mac to help you here, nor do I have much experience 
> with it.

I notice you did not send me the GDB script. This is an example of you not 
answering a question I asked. Could you please seriously make an effort into 
writing such a script, and not make further suggestions that I use alternative 
tools until you have done so? I’m not trying to have you shut up here, but 
rather reflect on what you were asking, and why this is absolutely not the 
“proper tool”, and that my supposedly not knowing how to use GDB has absolutely 
nothing to do with the matter.

Making the hypothesis that the reason I suggest a mechanism like the one I 
shared is because I don’t know how to use GDB or whatever else is not just 
insulting, it shows you really did not put that much effort into understanding 
why I am making this proposal, and what problems it addresses. I am sorry, but 
based on your record, I am highly convinced you can do much better.

Of course, the fact that GDB does not work well on the macOS platform (and 
neither do systemtap, or ftrace, or …) would be enough to kill these approaches 
for me. But that’s not the core of the argument. Even if I decided to only 
focus on today’s Linux, and assumed you can install any package you want, I 
would still consider using gdb for the examples I gave as, at best, a clumsy 
workaround, and at worse a big fat waste of time. If you write the gdb script, 
I believe you will reach that conclusion yourself.

> 
>>> 
>>>> 
>>>> 4. A symbolic view of the initial exchange (not all cases are covered in
>>>> marshal.py / demarshal.py yet, so some fields are likely missing).
>>>> 
>>>>> spicy -t marshall_read -h turbo -p 5901
>>>> 
>>>> [marshall_read]  SpiceMsgMainInit.session_id=1745897490
>>>> [marshall_read]  SpiceMsgMainInit.display_channels_hint=1
>>>> [marshall_read]  SpiceMsgMainInit.supported_mouse_modes=3
>>>> [marshall_read]  SpiceMsgMainInit.current_mouse_mode=2
>>>> [marshall_read]  SpiceMsgMainInit.agent_connected=1
>>>> [marshall_read]  SpiceMsgMainInit.agent_tokens=10
>>>> [marshall_read]  SpiceMsgMainInit.multi_media_time=226863047
>>>> [marshall_read]  SpiceMsgMainInit.ram_hint=16777216
>>>> [marshall_read]  SpiceMsgMainName.name_len=10
>>>> [marshall_read]  SpiceMsgPing.id=1
>>>> [marshall_read]  SpiceMsgPing.timestamp=226863447500
>>>> [marshall_read]  SpiceMsgPing.id=2
>>>> [marshall_read]  SpiceMsgPing.timestamp=226863447502
>>>> [marshall_read]  SpiceMsgPing.id=3
>>>> [marshall_read]  SpiceMsgPing.timestamp=226863447503
>>>> [marshall_read]  SpiceMsgChannels.num_of_channels=8
>>>> [marshall_read]  SpiceChannelId.type=2
>>>> [marshall_read]  SpiceChannelId.id=0
>>>> [marshall_read]  SpiceChannelId.type=4
>>>> [marshall_read]  SpiceChannelId.id=0
>>>> [marshall_read]  SpiceChannelId.type=9
>>>> [marshall_read]  SpiceChannelId.id=1
>>>> [marshall_read]  SpiceChannelId.type=9
>>>> [marshall_read]  SpiceChannelId.id=0
>>>> [marshall_read]  SpiceChannelId.type=6
>>>> [marshall_read]  SpiceChannelId.id=0
>>>> [marshall_read]  SpiceChannelId.type=5
>>>> [marshall_read]  SpiceChannelId.id=0
>>>> [marshall_read]  SpiceChannelId.type=2
>>>> [marshall_read]  SpiceChannelId.id=1
>>>> [marshall_read]  SpiceChannelId.type=3
>>>> [marshall_read]  SpiceChannelId.id=0
>>> 
>>> Same here, this is so verbose and low-level than you can simply attach gdb
>>> and pretty-print the structure you are interested in.
>> 
>> Possible ≠ easy. You can also simply edit source code with sed. That does not
>> make it a good idea.
>> 
>>> This is really just useful to developpers, no one else could make sense of
>>> these internal stuff.
>> 
>> Asking a customer to send the output of spicy -t magic_trace seems easier
>> than asking them to run a gdb script.
> 
> I never said you have to ask customer for running gdb.

Actually, you did, if implicitly. You suggested that this was “really just 
useful to developers” just above, and that the correct way to get the 
information was to “simply attach gdb and pretty print the structure you are 
interested in". Meaning that if, as a developer, I want that information from a 
customer, I have to ask the customer to run GDB. It’s the most logical 
conclusion of what you wrote. 

The very reason I dragged an hypothetical customer in the scenario was to 
demonstrate that “no one else [than a developer] could make sense of these 
internal stuff” does not imply the conclusion you wanted me to draw that the 
best way to print that stuff is with gdb, and (presumably from there) that 
whoever thinks otherwise does not know how to use gdb. See also ssh -vvv 
example below.

> At this point, the developper should have enough information to reproduce the 
> problem.

How? You said yourself that the information was available through GDB and that 
this was the right way to get it. But let’s assume I understood you 
incorrectly. Not using GDB, only the existing debug log, how does a developer 
know that some step in the protocol misbehaved and how? Don’t you find it 
strange to have a program that is designed to talk to another, and where the 
only way to know what it understood from the other side is to attach a debugger 
or a packet analyzer? (As an aside, wireshark tells me what goes on the wire, 
not what spice thinks it sent or received)

When some ssh session does not work the way I expect, I don’t attach a debugger 
or wireshark to ssh, I use ssh -vvv. And quite often, ssh -vvv is enough to 
tell me where things broke down.  I don’t have that for spice, and it’s a Bad 
Thing™ in my opinion. If some ssh developer told me that I cannot add code for 
-vvv because gdb or wireshark are the "proper tools" for figuring out what’s 
going on, I would find that ludicrous.

> I happened very seldomly that you have to provide the user with a custom 
> binary for testing, which may provide additional log (a good indication that 
> this log should be upstream this time)

Can you give some concrete examples? It would be interesting to see if the 
information that was added makes sense to have all the time, or if it is 
something that you’d ask only if some specific symptoms are there. My working 
hypothesis is that a lot of that “sometimes useful” information never made it 
to the code precisely because it would have been there all the time when no one 
needed it, and been seen as clutter. This is precisely the problem that the -t 
option addresses: a scalable way to add as many surgically precise messages as 
you want, without cluttering the general output that everybody shares.

> 
>> 
>>> 
>>>> 
>>>> I believe this example illustrates well the kind of instrumentation that
>>>> is
>>>> useful and convenient to have there all the time, but that we don’t want
>>>> to
>>>> send to a debug log if not explicitly requested. I see no reason to
>>>> pollute
>>>> even a structured log with that stuff, unless I actually need it. One line
>>>> per field with a common structure on purpose, to make it easy to grep /
>>>> awk
>>>> the data you want.
>>>> 
>>> 
>>> No, I don't think we need this available "all the time" or put differently,
>>> in the code. We need useful debug log, but when it comes to internal data
>>> structure dump or byte stream analaysis, use proper tools: gdb &
>>> tcpdump/wireshark for ex.
>> 
>> Use what you want. But why prevent me from using tools I find less primitive
>> for the task?
> 
> We don't have the same perception of primitive tools.

No, we don’t. But I also believe you misunderstood what I meant by “primitive”, 
and interpreted it as “simple”.

I used “primitive” in the same sense that C keywords are the primitives of the 
language, i.e. something designed as building blocks to build systems, without 
any bulit-in semantics. This is in opposition to something having semantics, it 
does not imply “simple”, but it implies “insufficient". GDB is way more 
complicated than my little preprocessor tricks and printfs. Yet a 
context-sensitive, semantic-aware printf statement in spice will still be less 
primitive, because it deals with aspects that I have to build manually in GDB. 
In other words, GDB and printf are primitives. GDB scripts and printf 
statements are not, and can be said to be higher-level.

I propose a trace options system which lets us add contents with a lot of 
semantics to the spice logs. You countered with GDB pretty printing of 
structures, which is a primitive on which I could build a (different) semantics 
system using GDB scripts. Now, I believe that this can be done. I also believe 
that it would be more complex to build a decent collection of spice-aware GDB 
scripts than to add printf statements in the code. This is why I asked you to 
provide such a script, as a comparison point. I’m not trying to be a smartass 
or anything, I genuinely want to see if gdb would be effectively shorter and 
simpler than printf for the examples. I personally think it wouldn’t.

But when I say I want to use less primitive tools, it is to point out that you 
can’t offer primitives as a replacement for semantics. You can’t offer putchar 
as a replacement for printf, one is primitive, the other has semantics about 
streams and formatting that you have to reconstruct if you use putchar. You 
can’t offer gdb alone as a replacement for protocol traces. You need to add 
scripts that dump stuff.

Does that clarify what I meant?

Regarding simplicity of GDB, x/100bx data is a good manual alternative to my 
hex dump, and automated alternatives are possible 
(https://stackoverflow.com/questions/9233095/memory-dump-formatted-like-xxd-from-gdb).
 As you wrote, pretty printing is a good way to show C structures. BUT the hex 
dump is not any simpler than the one I wrote in C (and presumably lower 
performance), and the pretty printing of C structures is made more complicated 
by the fact that there are many marshalling functions, so you need to add 
breakpoints and print statements in each of them. Again, try it, share your 
findings, let’s see if it’s really simpler.

> 
>>> 
>>> I don't think you make a good point about the "log tweaks" either.
>> 
>> I don’t see you having made any good point against them, only suggesting
>> purely theoretical alternatives that would turn minutes of work into days.
> 
> You won't be surprised that I think quite the contrary.
> 
> I'd simply like to:
> - not make needless tweaks available in production builds

Please justify that they are needless, when repeatedly stated I need them and 
explained why. It is important that you address my needs, otherwise you just 
assert your personal preferences.

> - not clutter the code with logging very specific to a bug (just imagine if 
> everybody added its own bug-specific log in upstream code)

On the code side, please explain how the spice_trace(name_of_trace, 
printf_format, …) syntax I suggested clutters the code, and what you’d see as 
an alternative, taking into account that I need the trace, and mostly 
classified existing traces, so simply removing the trace statement does not 
address my requirements.

On the log side, please explain why it’s better to clutter the logs 
unconditionally with uncategorized messages.

If I were to use the word “clutter”, it would be to describe the existing logs, 
with over 60% of the byte count in there being generally useless or redundant 
information (I derive the 60% from the ratio of sizes between two files I 
shared earlier, 79M vs 29M).

> - not reinvent the wheel (like parsing etc)

Please explain why you think my proposal does that. I’m adding one option 
handler, and one shared option parser that deals both with environment 
variables and command-line options. Does this duplicate any existing parsing?

> - use existing logging infrastructure to be coherent with the rest of the 
> system, and stay familiar to the developer/admin of that system 

Please explain why you think I’m not using the existing logging infrastructure, 
when I made quite a few efforts to ensure that existing logging options behave 
mostly like they used to. If there are errors in the way I do this, they can be 
fixed. But as stated, this is a cheap attack on code you obviously did not take 
the time to read.

> - use existing, powerful tools and framework, familiar to existing developers

Please explain how adding a few printf statements precludes you from using 
these “powerful tools and frameworks”. I did respond to this earlier. Nothing I 
did prevents you from using gdb and wireshark and systemtap if you want to. But 
please acknowledge that nothing these tools does really solves the problem I’m 
addressing.

> 
> What I consider useful in what you propose, is to add more debug and add log 
> categories. The rest fits a poor-man way of debugging, imho, and doesn't 
> match well with common practices and tools. I will study your patches in more 
> details when I have time to see if I missed something. 

I do resent attacks like “poor-man way of debugging”.

Thanks,
Christophe

> _______________________________________________
> Spice-devel mailing list
> Spice-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/spice-devel

_______________________________________________
Spice-devel mailing list
Spice-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/spice-devel

Re: [Spice-devel] RFC: Lightweight tracing mechanism

Reply via email to