Hi Brendan, I'm still not understanding who is taking the actual stack traces (let alone the symbols) in your examples. Is this done by 'perf' itself based only on the frame pointer?
As I wrote before, this is pretty hard to get right for a JVM, but there are good approximations. Have you looked at the 'jstack' tool which is part of the JDK? If you run it on a Java process, it will give you exact stack traces with full inlining information. However this only works at safepoints so it is probably not suitable for profiling with performance counters. But you can also use 'jstack -F -m' which gives you a 'best effort' mixed Java/C++ stacaktrace (most of the time even with inlined Java frames. This is probably the best you can get when interrupting a running JVM at an arbitrary point in time. As you mentioned in one of your blogs, the VM can be in the C-Library or even in the kernel at that time which don't preserve the frame pointer either. So it will be already hard to even walk up to the first Java frame. But nevertheless, if the output of 'jstack -F -m' is "good enough" for your purpose, you can implement something similar in 'perf' or a helper library of 'perf' and be happy (I don't actually know how perf takes stack traces but I suppose there may some kind of callback mechanism for walking unknown frames). This is actually not so hard. I've recently implemented a "print_native_stack()" function within hotspot itself (you can call it for example from gdb during debugging - see http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/86183a940db4). Maye you could call this functions directly from 'perf' if perf attaches with ptrace to the process (I assume it does or how else could it walk the stack)? These were just some random thoughts with the hope that they may be helpful. Regards, Volker PS: by the way - the flame graphs look really impressive and it would be really nice to have something like this for Java. On Thu, Dec 4, 2014 at 11:55 PM, Brendan Gregg <brendan.d.gr...@gmail.com> wrote: > G'Day, > > I've hacked hotspot to return the frame pointer, in part to see what this > involves, and also to have a working prototype for analysis. Along with an > agent to resolve symbols, this has allowed full stack profiling using Linux > perf_events. The following flame graphs show the resulting profiles. > > A mixed mode CPU flame graph of a vert.x benchmark (click to zoom): > > http://www.brendangregg.com/FlameGraphs/cpu-mixedmode-vertx.svg > > Same thing, but this time disabling inlining, to show more frames: > > http://www.brendangregg.com/FlameGraphs/cpu-mixedmode-flamegraph.svg > > As expected, performance is worse without inlining. You can compare the > flame graphs side by side to see why. Less time spent doing work / I/O! > > https://github.com/brendangregg/Misc/blob/master/java/openjdk8_b132-fp.diff > is my patch, and currently only works for x86-64. It removes RBP from the > register pools, and inserts "mov(rbp, rsp)" into two function prologues. It > is also unsupported: use at your own risk. I'm not a veteran hotspot > engineer, so chances I messed something up are high. > > I'd love to be able to enable frame pointers in Oracle JDK, eg, with an > -XX:+NoOmitFramePointer option. It could be put under > -XX:+UnlockDiagnosticVMOptions or XX:+UnlockExperimentalVMOptions. So long > as we had some way to turn it on. If someone wants to include (improve, > rewrite) my patch, please do. > > I don't have much perf data yet, but on the vert.x microbenchmark it looked > like returning the frame pointer cost 2.6% performance. I hope that's > somewhat worst-case for production workloads. (I was also able to recover > the 2.6% by fine tuning other options, so were this a production change, I'd > be hoping not to regress performance at all.) > > We've discussed this before > (http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-October/thread.html#15939). > The Solaris-assisted approach that Serguei Spitsyn described (JDK-6617153) > should work very well. The JVM can run as-is, full stacks can be generated > on-demand, and symbols should always be correct. > > The frame pointer approach costs a little performance, and only shows > partial stacks after inlining (unless you disable inlining, but that can > cost >40% performance). There is the other issue Volker Simonis mentioned as > well, where some stacks may not be profiled correctly. And, if you are > unlucky, symbols can move during the profile, so any static perf-map-agent > map will translate some incorrectly (I've considered developing a way to > detect this, and highlight such frames as dubious.) > > At Netflix we are mostly Java on Linux. Switching to Oracle Solaris for this > feature is going to be a tough sell, especially when the value of full stack > profiling isn't widely understood. I personally think it might be a bit > easier if a -XX:+NoOmitFramePointer option existed, so Linux users can try > the feature, then consider the better Solaris version after gaining solid > experience on why it is so important. > > We recently blogged about the value of stack profiling and flame graphs, > http://techblog.netflix.com/2014/11/nodejs-in-flames.html, although this was > for Node.js, which already has frame pointer support. > > If anyone wants to try generating these mixed mode CPU flame graphs > themselves (in a test environment!), the first step is to compile OpenJDK 8 > b132 with the previous patch, and get that running. Also install the > packages for the "perf" command. The remaining steps would be something > like: > > # git clone --depth=1 https://github.com/brendangregg/FlameGraph > # git clone --depth=1 https://github.com/jrudolph/perf-map-agent > # cd perf-map-agent > # export JAVA_HOME=/... > # cmake . > # make > # perf record -F 99 -p `pgrep -n java` -g -- sleep 30 > # java -cp attach-main.jar:$JAVA_HOME/lib/tools.jar > net.virtualvoid.perf.AttachOnce `pgrep -n java` > # perf script > ../FlameGraph/out.stacks > # cd ../FlameGraph > # ./stackcollapse-perf.pl < out.stacks | ./flamegraph.pl --color=java > > out.svg > > Finally, if you are new to CPU flame graphs, see > http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html . > > Brendan