Re: [RFC PATCH 27/30] Code tagging based latency tracking
On Thu, 1 Sep 2022 21:35:32 -0400 Kent Overstreet wrote: > On Thu, Sep 01, 2022 at 08:23:11PM -0400, Steven Rostedt wrote: > > If ftrace, perf, bpf can't do what you want, take a harder look to see if > > you can modify them to do so. > > Maybe we can use this exchange to make both of our tools better. I like your > histograms - the quantiles algorithm I've had for years is janky, I've been > meaning to rip that out, I'd love to take a look at your code for that. And > having an on/off switch is a good idea, I'll try to add that at some point. > Maybe you got some ideas from my stuff too. > > I'd love to get better tracepoints for measuring latency - what I added to > init_wait() and finish_wait() was really only a starting point. Figuring out > the right places to measure is where I'd like to be investing my time in this > area, and there's no reason we couldn't both be making use of that. Yes, this is exactly what I'm talking about. I'm not against your work, I just want you to work more with everyone to come up with ideas that can help everyone as a whole. That's how "open source communities" is suppose to work ;-) The histogram and synthetic events can use some more clean ups. There's a lot of places that can be improved in that code. But I feel the ideas behind that code is sound. It's just getting the implementation to be a bit more efficient. > > e.g. with kernel waitqueues, I looked at hooking prepare_to_wait() first but > not > all code uses that, init_wait() got me better coverage. But I've already seen > that that misses things, too, there's more work to be done. I picked prepare_to_wait() just because I was hacking up something quick and thought that was "close enough" ;-) -- Steve
Re: [RFC PATCH 27/30] Code tagging based latency tracking
On Thu, Sep 01, 2022 at 08:23:11PM -0400, Steven Rostedt wrote: > If ftrace, perf, bpf can't do what you want, take a harder look to see if > you can modify them to do so. Maybe we can use this exchange to make both of our tools better. I like your histograms - the quantiles algorithm I've had for years is janky, I've been meaning to rip that out, I'd love to take a look at your code for that. And having an on/off switch is a good idea, I'll try to add that at some point. Maybe you got some ideas from my stuff too. I'd love to get better tracepoints for measuring latency - what I added to init_wait() and finish_wait() was really only a starting point. Figuring out the right places to measure is where I'd like to be investing my time in this area, and there's no reason we couldn't both be making use of that. e.g. with kernel waitqueues, I looked at hooking prepare_to_wait() first but not all code uses that, init_wait() got me better coverage. But I've already seen that that misses things, too, there's more work to be done. random thought: might try adding a warning in schedule() any time it's called and codetag_time_stats_start() hasn't been called, that'll be a starting point...
Re: [RFC PATCH 27/30] Code tagging based latency tracking
On Thu, 1 Sep 2022 18:55:15 -0400 Kent Overstreet wrote: > On Thu, Sep 01, 2022 at 06:34:30PM -0400, Steven Rostedt wrote: > > On Thu, 1 Sep 2022 17:54:38 -0400 > > Kent Overstreet wrote: > > > > > > So this looks like it's gotten better since I last looked, but it's still > > > not > > > there yet. > > > > > > Part of the problem is that the tracepoints themselves are in the wrong > > > place: > > > your end event is when a task is woken up, but that means spurious > > > wakeups will > > > > The end event is when a task is scheduled onto the CPU. The start event is > > the first time it is woken up. > > Yeah, that's not what I want. You're just tracing latency due to having more > processes runnable than CPUs. > > I don't care about that for debugging, though! I specifically want latency at > the wait_event() level, and related - every time a process blocked _on some > condition_, until that condition became true. Not until some random, > potentially > spurious wakeup. Ideally this would be better if we could pass the stack trace from one event to the next, but that wouldn't be too hard to implement. It just needs to be done. But anyway: # echo 'p:wait prepare_to_wait_event' > /sys/kernel/tracing/kprobe_events // created an event on prepare_to_wait_event as that's usually called just // before wait event. # sqlhist -e -n wait_sched 'select start.common_pid as pid,(end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) as delta from wait as start join sched_switch as end on start.common_pid = end.prev_pid where end.prev_state & 3' // Create a "wait_sched" event that traces the time between the // prepare_to_wait_event call and the scheduler. Only trigger it if the // schedule happens in the interruptible or uninterruptible states. # sqlhist -e -n wake_sched 'select start.pid,(end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) as delta2 from wait_sched as start join sched_switch as end on start.pid = end.next_pid where start.delta < 50' // Now attach the wait_event to the sched_switch where the task gets // scheduled back in. But we are only going to care if the delta between // the prepare_to_wait_event and the schedule is less that 50us. This is a // hack to just care about where a prepare_to_wait_event was done just before // scheduling out. # echo 'hist:keys=pid,delta2.buckets=10:sort=delta2' > /sys/kernel/tracing/events/synthetic/wake_sched/trigger // Now we are going to look at the deltas that the task was sleeping for an // event. But this just gives pids and deltas. # echo 'hist:keys=pid,stacktrace if delta < 50' >> /sys/kernel/tracing/events/synthetic/wait_sched/trigger // And this is to get the backtraces of where the task was. This is because // the stack trace is not available at the schedule in, because the // sched_switch can only give the stack trace of when a task schedules out. // Again, this is somewhat a hack. # cat /sys/kernel/tracing/events/synthetic/wake_sched/hist # event histogram # # trigger info: hist:keys=pid,delta2.buckets=10:vals=hitcount:sort=delta2.buckets=10:size=2048 [active] # { pid: 2114, delta2: ~ 10-19 } hitcount: 1 { pid: 1389, delta2: ~ 160-169 } hitcount: 1 { pid: 1389, delta2: ~ 660-669 } hitcount: 1 { pid: 1389, delta2: ~ 1020-1029 } hitcount: 1 { pid: 1189, delta2: ~ 500020-500029 } hitcount: 1 { pid: 1189, delta2: ~ 500030-500039 } hitcount: 1 { pid: 1195, delta2: ~ 500030-500039 } hitcount: 2 { pid: 1189, delta2: ~ 500040-500049 } hitcount: 10 { pid: 1193, delta2: ~ 500040-500049 } hitcount: 3 { pid: 1197, delta2: ~ 500040-500049 } hitcount: 3 { pid: 1195, delta2: ~ 500040-500049 } hitcount: 9 { pid: 1190, delta2: ~ 500050-500059 } hitcount: 55 { pid: 1197, delta2: ~ 500050-500059 } hitcount: 51 { pid: 1191, delta2: ~ 500050-500059 } hitcount: 61 { pid: 1198, delta2: ~ 500050-500059 } hitcount: 56 { pid: 1195, delta2: ~ 500050-500059 } hitcount: 48 { pid: 1192, delta2: ~ 500050-500059 } hitcount: 54 { pid: 1194, delta2: ~ 500050-500059 } hitcount: 50 { pid: 1196, delta2: ~ 500050-500059 } hitcount: 57 { pid: 1189, delta2: ~ 500050-500059 } hitcount: 48 { pid: 1193, delta2: ~ 500050-500059 } hitcount: 52 { pid: 1194, delta2: ~ 500060-500069 } hitcount: 12 { pid: 1191, delta2: ~ 500060-500069 } hitcount: 2 { pid: 1190, delta2: ~ 500060-500069 } hitcount: 7 { pid: 1198, delta2: ~ 500060-500069 } hitcount: 9 { pid: 1193, delta2: ~ 500060-500069 } hitcount: 6 { pid: 1196, delta2: ~ 500060-500069 } hitcount: 5 { pid: 1192, delta2: ~ 500060-500069 } hitcount: 9 { pid: 1197, delta2: ~ 500060-500069 } hitcount: 9 { pid:
Re: [RFC PATCH 27/30] Code tagging based latency tracking
On Thu, Sep 01, 2022 at 06:34:30PM -0400, Steven Rostedt wrote: > On Thu, 1 Sep 2022 17:54:38 -0400 > Kent Overstreet wrote: > > > > So this looks like it's gotten better since I last looked, but it's still > > not > > there yet. > > > > Part of the problem is that the tracepoints themselves are in the wrong > > place: > > your end event is when a task is woken up, but that means spurious wakeups > > will > > The end event is when a task is scheduled onto the CPU. The start event is > the first time it is woken up. Yeah, that's not what I want. You're just tracing latency due to having more processes runnable than CPUs. I don't care about that for debugging, though! I specifically want latency at the wait_event() level, and related - every time a process blocked _on some condition_, until that condition became true. Not until some random, potentially spurious wakeup. > Not the prettiest thing to read. But hey, we got the full stack of where > these latencies happened! Most of the time I _don't_ want full stacktraces, though! That means I have a ton more output to sort through, and the data is far more expensive to collect. I don't know why it's what people go to first - see the page_owner stuff - but that doesn't get used much either because the output is _really hard to sort through_. Most of the time, just a single file and line number is all you want - and tracing has always made it hard to get at that. > Yes, it adds some overhead when the events are triggered due to the > stacktrace code, but it's extremely useful information. > > > > > So, it looks like tracing has made some progress over the past 10 years, > > but for debugging latency issues it's still not there yet in general. I > > I call BS on that statement. Just because you do not know what has been > added to the kernel in the last 10 years (like you had no idea about > seq_buf and that was added in 2014) means to me that you are totally > clueless on what tracing can and can not do. > > It appears to me that you are too focused on inventing your own wheel that > does exactly what you want before looking to see how things are today. Just > because something didn't fit your needs 10 years ago doesn't mean that it > can't fit your needs today. ...And the ad hominem attacks start. Steve, I'm not attacking you, and there's room enough in this world for the both of us to be doing our thing creating new and useful tools. > I'm already getting complaints from customers/users that are saying there's > too many tools in the toolbox already. (Do we use ftrace/perf/bpf?). The > idea is to have the tools using mostly the same infrastructure, and not be > 100% off on its own, unless there's a clear reason to invent a new wheel > that several people are asking for, not just one or two. I would like to see more focus on usability. That means, in a best case scenario, always-on data collection that I can just look at, and it'll already be in the format most likely to be useful. Surely you can appreciate the usefulness of that..? Tracing started out as a tool for efficiently getting lots of data out of the kernel, and it's great for that. But I think your focus on the cool thing you built may be blinding you a bit to alternative approaches...
Re: [RFC PATCH 27/30] Code tagging based latency tracking
On Thu, 1 Sep 2022 17:54:38 -0400 Kent Overstreet wrote: > > So this looks like it's gotten better since I last looked, but it's still not > there yet. > > Part of the problem is that the tracepoints themselves are in the wrong place: > your end event is when a task is woken up, but that means spurious wakeups > will The end event is when a task is scheduled onto the CPU. The start event is the first time it is woken up. > cause one wait_event() call to be reported as multiple smaller waits, not one > long wait - oops, now I can't actually find the thing that's causing my > multi-second delay. > > Also, in your example you don't have it broken out by callsite. That would be > the first thing I'd need for any real world debugging. OK, how about this (currently we can only have 3 keys, but you can create multiple histograms on the same event). # echo 'hist:keys=comm,stacktrace,delta.buckets=10:sort=delta' > /sys/kernel/tracing/events/synthetic/wakeup_lat/trigger (notice the "stacktrace" in the keys) # cat /sys/kernel/tracing/events/synthetic/wakeup_lat/hist # event histogram # # trigger info: hist:keys=comm,stacktrace,delta.buckets=10:vals=hitcount:sort=delta.buckets=10:size=2048 [active] # { comm: migration/2 , stacktrace: event_hist_trigger+0x290/0x2b0 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x240 trace_event_raw_event_sched_switch+0x120/0x180 __traceiter_sched_switch+0x39/0x50 __schedule+0x310/0x700 schedule_idle+0x26/0x40 do_idle+0xb4/0xd0 cpu_startup_entry+0x19/0x20 secondary_startup_64_no_verify+0xc2/0xcb , delta: ~ 10-19} hitcount: 7 { comm: migration/5 , stacktrace: event_hist_trigger+0x290/0x2b0 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x240 trace_event_raw_event_sched_switch+0x120/0x180 __traceiter_sched_switch+0x39/0x50 __schedule+0x310/0x700 schedule_idle+0x26/0x40 do_idle+0xb4/0xd0 cpu_startup_entry+0x19/0x20 secondary_startup_64_no_verify+0xc2/0xcb , delta: ~ 10-19} hitcount: 7 { comm: migration/1 , stacktrace: event_hist_trigger+0x290/0x2b0 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x240 trace_event_raw_event_sched_switch+0x120/0x180 __traceiter_sched_switch+0x39/0x50 __schedule+0x310/0x700 schedule_idle+0x26/0x40 do_idle+0xb4/0xd0 cpu_startup_entry+0x19/0x20 secondary_startup_64_no_verify+0xc2/0xcb , delta: ~ 10-19} hitcount: 7 { comm: migration/7 , stacktrace: event_hist_trigger+0x290/0x2b0 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x240 trace_event_raw_event_sched_switch+0x120/0x180 __traceiter_sched_switch+0x39/0x50 __schedule+0x310/0x700 schedule_idle+0x26/0x40 do_idle+0xb4/0xd0 cpu_startup_entry+0x19/0x20 secondary_startup_64_no_verify+0xc2/0xcb , delta: ~ 10-19} hitcount: 7 { comm: migration/0 , stacktrace: event_hist_trigger+0x290/0x2b0 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x240 trace_event_raw_event_sched_switch+0x120/0x180 __traceiter_sched_switch+0x39/0x50 __schedule+0x310/0x700 schedule_idle+0x26/0x40 do_idle+0xb4/0xd0 cpu_startup_entry+0x19/0x20 start_kernel+0x595/0x5be secondary_startup_64_no_verify+0xc2/0xcb , delta: ~ 10-19} hitcount: 7 { comm: migration/4 , stacktrace: event_hist_trigger+0x290/0x2b0 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x240 trace_event_raw_event_sched_switch+0x120/0x180 __traceiter_sched_switch+0x39/0x50 __schedule+0x310/0x700 schedule_idle+0x26/0x40 do_idle+0xb4/0xd0 cpu_startup_entry+0x19/0x20 secondary_startup_64_no_verify+0xc2/0xcb , delta: ~ 10-19} hitcount: 7 { comm: rtkit-daemon , stacktrace: event_hist_trigger+0x290/0x2b0 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x240 trace_event_raw_event_sched_switch+0x120/0x180 __traceiter_sched_switch+0x39/0x50 __schedule+0x310/0x700 preempt_schedule_common+0x2d/0x70 preempt_schedule_thunk+0x16/0x18 _raw_spin_unlock_irq+0x2e/0x40 eventfd_write+0xc8/0x290 vfs_write+0xc0/0x2a0 ksys_write+0x5f/0xe0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x61/0xcb , delta: ~
Re: [RFC PATCH 27/30] Code tagging based latency tracking
On Thu, Sep 01, 2022 at 05:38:44PM -0400, Steven Rostedt wrote: > On Tue, 30 Aug 2022 14:49:16 -0700 > Suren Baghdasaryan wrote: > > > From: Kent Overstreet > > > > This adds the ability to easily instrument code for measuring latency. > > To use, add the following to calls to your code, at the start and end of > > the event you wish to measure: > > > > code_tag_time_stats_start(start_time); > > code_tag_time_stats_finish(start_time); > > So you need to modify the code to see what you want? Figuring out the _correct_ place to measure is often a significant amount of the total effort. Having done so once, why not annotate that in the source code? > For function length you could just do something like this: > > # cd /sys/kernel/tracing > # echo __skb_wait_for_more_packets > set_ftrace_filter > # echo 1 > function_profile_enabled > # cat trace_stat/function* > Function HitTimeAvg > s^2 > ------ > --- > __skb_wait_for_more_packets 10.000 us0.000 us > 0.000 us > Function HitTimeAvg > s^2 > ------ > --- > __skb_wait_for_more_packets 174.813 us 74.813 us > 0.000 us > Function HitTimeAvg > s^2 > ------ > --- > Function HitTimeAvg > s^2 > ------ > --- > > The above is for a 4 CPU machine. The s^2 is the square of the standard > deviation (makes not having to do divisions while it runs). > > But if you are looking for latency between two events (which can be kprobes > too, where you do not need to rebuild your kernel): > > From: https://man.archlinux.org/man/sqlhist.1.en > which comes in: https://git.kernel.org/pub/scm/libs/libtrace/libtracefs.git/ > if not already installed on your distro. > > # sqlhist -e -n wakeup_lat 'select end.next_comm as > comm,start.pid,start.prio,(end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) as > delta from sched_waking as start join sched_switch as end on start.pid = > end.next_pid where start.prio < 100' > > The above creates a synthetic event called "wakeup_lat" that joins two > events (sched_waking and sched_switch) when the pid field of sched_waking > matches the next_pid field of sched_switch. When there is a match, it will > trigger the wakeup_lat event only if the prio of the sched_waking event is > less than 100 (which in the kernel means any real-time task). The > wakeup_lat event will record the next_comm (as comm field), the pid of > woken task and the time delta in microseconds between the two events. So this looks like it's gotten better since I last looked, but it's still not there yet. Part of the problem is that the tracepoints themselves are in the wrong place: your end event is when a task is woken up, but that means spurious wakeups will cause one wait_event() call to be reported as multiple smaller waits, not one long wait - oops, now I can't actually find the thing that's causing my multi-second delay. Also, in your example you don't have it broken out by callsite. That would be the first thing I'd need for any real world debugging. So, it looks like tracing has made some progress over the past 10 years, but for debugging latency issues it's still not there yet in general. I will definitely remember function latency tracing the next time I'm doing performance work, but I expect that to be far too heavy to enable on a live server. This thing is only a couple hundred lines of code though, so perhaps tracing shouldn't be the only tool in our toolbox :)
Re: [RFC PATCH 27/30] Code tagging based latency tracking
On Thu, 1 Sep 2022 17:38:44 -0400 Steven Rostedt wrote: > # echo 'hist:keys=comm,prio,delta.buckets=10:sort=delta' > > /sys/kernel/tracing/events/synthetic/wakeup_lat/trigger The above could almost be done with sqlhist (but I haven't implemented "buckets=10" yet because that's a new feature. But for now, let's do log2): # sqlhist -e 'select comm,prio,cast(delta as log2) from wakeup_lat' ("-e" is to execute the command, as it normally only displays what commands need to be run to create the synthetic events and histograms) # cat /sys/kernel/tracing/events/synthetic/wakeup_lat/hist # event histogram # # trigger info: hist:keys=comm,prio,delta.log2:vals=hitcount:sort=hitcount:size=2048 [active] # { comm: migration/4 , prio: 0, delta: ~ 2^5 } hitcount: 1 { comm: migration/0 , prio: 0, delta: ~ 2^4 } hitcount: 2 { comm: rtkit-daemon , prio: 0, delta: ~ 2^7 } hitcount: 2 { comm: rtkit-daemon , prio: 0, delta: ~ 2^6 } hitcount: 4 { comm: migration/0 , prio: 0, delta: ~ 2^5 } hitcount: 8 { comm: migration/4 , prio: 0, delta: ~ 2^4 } hitcount: 9 { comm: migration/2 , prio: 0, delta: ~ 2^4 } hitcount: 10 { comm: migration/5 , prio: 0, delta: ~ 2^4 } hitcount: 10 { comm: migration/7 , prio: 0, delta: ~ 2^4 } hitcount: 10 { comm: migration/1 , prio: 0, delta: ~ 2^4 } hitcount: 10 { comm: migration/6 , prio: 0, delta: ~ 2^4 } hitcount: 10 Totals: Hits: 76 Entries: 11 Dropped: 0 -- Steve
Re: [RFC PATCH 27/30] Code tagging based latency tracking
On Tue, 30 Aug 2022 14:49:16 -0700 Suren Baghdasaryan wrote: > From: Kent Overstreet > > This adds the ability to easily instrument code for measuring latency. > To use, add the following to calls to your code, at the start and end of > the event you wish to measure: > > code_tag_time_stats_start(start_time); > code_tag_time_stats_finish(start_time); So you need to modify the code to see what you want? > > Stastistics will then show up in debugfs under > /sys/kernel/debug/time_stats, listed by file and line number. > > Stastics measured include weighted averages of frequency, duration, max > duration, as well as quantiles. > > This patch also instruments all calls to init_wait and finish_wait, > which includes all calls to wait_event. Example debugfs output: > > fs/xfs/xfs_trans_ail.c:746 module:xfs func:xfs_ail_push_all_sync > count: 17 > rate: 0/sec > frequency: 2 sec > avg duration: 10 us > max duration: 232 us > quantiles (ns): 128 128 128 128 128 128 128 128 128 128 128 128 128 128 128 > > lib/sbitmap.c:813 module:sbitmap func:sbitmap_finish_wait > count: 3 > rate: 0/sec > frequency: 4 sec > avg duration: 4 sec > max duration: 4 sec > quantiles (ns): 0 4288669120 4288669120 5360836048 5360836048 5360836048 > 5360836048 5360836048 5360836048 5360836048 5360836048 5360836048 5360836048 > 5360836048 5360836048 > > net/core/datagram.c:122 module:datagram func:__skb_wait_for_more_packets > count: 10 > rate: 1/sec > frequency: 859 ms > avg duration: 472 ms > max duration: 30 sec > quantiles (ns): 0 12279 12279 15669 15669 15669 15669 17217 17217 17217 17217 > 17217 17217 17217 17217 For function length you could just do something like this: # cd /sys/kernel/tracing # echo __skb_wait_for_more_packets > set_ftrace_filter # echo 1 > function_profile_enabled # cat trace_stat/function* Function HitTimeAvg s^2 ------ --- __skb_wait_for_more_packets 10.000 us0.000 us 0.000 us Function HitTimeAvg s^2 ------ --- __skb_wait_for_more_packets 174.813 us 74.813 us 0.000 us Function HitTimeAvg s^2 ------ --- Function HitTimeAvg s^2 ------ --- The above is for a 4 CPU machine. The s^2 is the square of the standard deviation (makes not having to do divisions while it runs). But if you are looking for latency between two events (which can be kprobes too, where you do not need to rebuild your kernel): From: https://man.archlinux.org/man/sqlhist.1.en which comes in: https://git.kernel.org/pub/scm/libs/libtrace/libtracefs.git/ if not already installed on your distro. # sqlhist -e -n wakeup_lat 'select end.next_comm as comm,start.pid,start.prio,(end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) as delta from sched_waking as start join sched_switch as end on start.pid = end.next_pid where start.prio < 100' The above creates a synthetic event called "wakeup_lat" that joins two events (sched_waking and sched_switch) when the pid field of sched_waking matches the next_pid field of sched_switch. When there is a match, it will trigger the wakeup_lat event only if the prio of the sched_waking event is less than 100 (which in the kernel means any real-time task). The wakeup_lat event will record the next_comm (as comm field), the pid of woken task and the time delta in microseconds between the two events. # echo 'hist:keys=comm,prio,delta.buckets=10:sort=delta' > /sys/kernel/tracing/events/synthetic/wakeup_lat/trigger The above starts a histogram tracing the name of the woken task, the priority and the delta (but placing the delta in buckets of size 10, as we do not need to see every latency number). # chrt -f 20 sleep 1 Force something to be woken up that is interesting. # cat /sys/kernel/tracing/events/synthetic/wakeup_lat/hist # event histogram # # trigger info: hist:keys=comm,prio,delta.buckets=10:vals=hitcount:sort=delta.buckets=10:size=2048 [active] # { comm: migration/5 , prio: 0, delta: ~ 10-19 } hitcount: 1 { comm: migration/2 , prio: 0, delta: ~ 10-19 } hitcount: 1 { comm: sleep , prio: 79, delta: ~ 10-19 } hitcount: 1 { comm: migration/7 , prio: 0, delta: ~ 10-19 } hitcount:
Re: [RFC PATCH 27/30] Code tagging based latency tracking
On Thu, Sep 01, 2022 at 09:11:17AM +0200, Peter Zijlstra wrote: > On Tue, Aug 30, 2022 at 02:49:16PM -0700, Suren Baghdasaryan wrote: > > From: Kent Overstreet > > > > This adds the ability to easily instrument code for measuring latency. > > To use, add the following to calls to your code, at the start and end of > > the event you wish to measure: > > > > code_tag_time_stats_start(start_time); > > code_tag_time_stats_finish(start_time); > > > > Stastistics will then show up in debugfs under > > /sys/kernel/debug/time_stats, listed by file and line number. > > > > Stastics measured include weighted averages of frequency, duration, max > > duration, as well as quantiles. > > > > This patch also instruments all calls to init_wait and finish_wait, > > which includes all calls to wait_event. Example debugfs output: > > How can't you do this with a simple eBPF script on top of > trace_sched_stat_* and friends? I know about those tracepoints, and I've never found them to be usable. I've never succesfully used them for debugging latency issues, or known anyone who has. And an eBPF script to do everything this does wouldn't be simple at all. Honesly, the time stats stuff looks _far_ simpler to me than anything involving tracing - and with tracing you have to correlate the start and end events after the fact.
Re: [RFC PATCH 27/30] Code tagging based latency tracking
On Tue, Aug 30, 2022 at 02:49:16PM -0700, Suren Baghdasaryan wrote: > From: Kent Overstreet > > This adds the ability to easily instrument code for measuring latency. > To use, add the following to calls to your code, at the start and end of > the event you wish to measure: > > code_tag_time_stats_start(start_time); > code_tag_time_stats_finish(start_time); > > Stastistics will then show up in debugfs under > /sys/kernel/debug/time_stats, listed by file and line number. > > Stastics measured include weighted averages of frequency, duration, max > duration, as well as quantiles. > > This patch also instruments all calls to init_wait and finish_wait, > which includes all calls to wait_event. Example debugfs output: How can't you do this with a simple eBPF script on top of trace_sched_stat_* and friends?
Re: [RFC PATCH 27/30] Code tagging based latency tracking
On Tue, Aug 30, 2022 at 6:53 PM Randy Dunlap wrote: > > > > On 8/30/22 14:49, Suren Baghdasaryan wrote: > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > > index b7d03afbc808..b0f86643b8f0 100644 > > --- a/lib/Kconfig.debug > > +++ b/lib/Kconfig.debug > > @@ -1728,6 +1728,14 @@ config LATENCYTOP > > Enable this option if you want to use the LatencyTOP tool > > to find out which userspace is blocking on what kernel operations. > > > > +config CODETAG_TIME_STATS > > + bool "Code tagging based latency measuring" > > + depends on DEBUG_FS > > + select TIME_STATS > > + select CODE_TAGGING > > + help > > + Enabling this option makes latency statistics available in debugfs > > Missing period at the end of the sentence. Ack. > > -- > ~Randy
Re: [RFC PATCH 27/30] Code tagging based latency tracking
On 8/30/22 14:49, Suren Baghdasaryan wrote: > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index b7d03afbc808..b0f86643b8f0 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -1728,6 +1728,14 @@ config LATENCYTOP > Enable this option if you want to use the LatencyTOP tool > to find out which userspace is blocking on what kernel operations. > > +config CODETAG_TIME_STATS > + bool "Code tagging based latency measuring" > + depends on DEBUG_FS > + select TIME_STATS > + select CODE_TAGGING > + help > + Enabling this option makes latency statistics available in debugfs Missing period at the end of the sentence. -- ~Randy