Frank Ch. Eigler wrote: > Hi - > > In a few contexts, it comes up as to whether it is faster to probe > process syscalls with kprobes or with something higher level such as > utrace. (There are other hypothetical options too (per-syscall > tracepoints) that could be measured this way in the future.)
These scenarios are a bit wrong: > Now we compare these scenarios: > > # stap -e 'probe never {}' -t --vp 00001 -c a.out > > Here, no actual probing occurs so we get a measurement of the plain > uninstrumented run time of ten million close(2)s. The above one is fine. > # stap -e 'probe process.syscall {}' -t --vp 00001 -c a.out > > Here, we intercept sys_close with a kprobe. If the system is not too > busy, we should pick up only the close(2)s coming from a.out, though a > few close(2)'s executed by other processes may show up. > > # stap -e 'probe syscall.close {}' -t --vp 00001 -c a.out > > Here, we intercept all a.out's syscalls with utrace. Other processes > are not affected at all, but other syscalls by a.out would be -- > though in our test, there are hardly any of those. These 2 are swapped: the 'process.syscall' probe is a utrace-based probe and the 'syscall.close' probe is a kprobe-based probe. Note that in the results, the description and probe types matched correctly. > Some typical results on my 2.66GHz 2*Xeon5150 machine runnin Fedora 9 - > 2.6.27.12: > > never: > Pass 5: run completed in 740usr/3310sys/4155real ms. > > kprobe: > probe syscall.close (<input>:1:1), hits: 10000028, cycles: > 176min/202avg/3632max > Pass 5: run completed in 750usr/9320sys/10193real ms. > > utrace: > probe process.syscall (<input>:1:1), hits: 10000025, cycles: > 176min/209avg/184392max > Pass 5: run completed in 1670usr/6860sys/8645real ms. > > So utrace added 4.5 seconds, and kprobes added 6.0 seconds to the > uninstrumented 4.1 second run time. But wait: we should subtract the > time taken by the probe handler itself: 200ish cycles at 2.66 GHz, > which is about 0.75 seconds. So the overheads are approximately: > > never: n/a > kprobe: 5.2 seconds => 0.52 us per hit > utrace: 3.6 seconds => 0.36 us per hit > > > Note that these are microbenchmarks that represent an ideal case > compared to a larger run, since they probably fit comfily inside > caches. They probably also undercount the probe handler's run time. -- David Smith dsm...@redhat.com Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax)