Knowing if there is a C stackframe in the middle of the stack while blocking on a synchronized is an important feature for a profiler when loom will land.
RĂ©mi ----- Original Message ----- > From: "Bechberger, Johannes" <johannes.bechber...@sap.com> > To: "hotspot-dev" <hotspot-...@openjdk.java.net>, > hotspot-jfr-...@openjdk.java.net, "serviceability-dev" > <serviceability-dev@openjdk.java.net> > Sent: Friday, March 18, 2022 10:43:58 AM > Subject: Proposal of a new version of AsyncGetCallTrace > Hi, > > I would like propose to > > 1. Replace duplicated stack walking code with unified API > 2. Create a new version of AsyncGetCallTrace, tentatively called > "AsyncGetCallTrace2", with more information on more frames using the unified > API > > A demo (as well as this text) is available at > https://github.com/parttimenerd/asgct2-demo > if you want to see a prototype of this proposal in action. > > Unify Stack Walking > ================ > > There are currently multiple implementations of stack walking in JFR and for > AsyncGetCallTrace. > They each implement their own extension of vframeStream but with comparable > features > and check for problematic frames. > > My proposal is, therefore, to replace the stack walking code with a unified > API > that > includes all error checking and vframeStream extensions in a single place. > The prosposed new class is called StackWalker and could be part of > `jfr/recorder/stacktrace` [1]. > This class also supports getting information on C frames so it can be > potentially > used for walking stacks in VMError (used to create hs_err files), further > reducing the amount of different stack walking code. > > AsyncGetCallTrace2 > ================ > > The AsyncGetCallTrace call has seen increasing use in recent years > in profilers like async-profiler. > But it is not really an API (not exported in any header) and > the information on frames it returns is pretty limited > (only the method and bci for Java frames) which makes implementing > profilers and other tooling harder. Tools like async-profiler > have to resort to complicated code to partially obtain the information > that the JVM already has. > Information that is currently hidden and impossible to obtain is > > - whether a compiled frame is inlined (currently only obtainable for the > topmost > compiled frames) > - although this can be obtained using JFR > - C frames that are not at the top of the stack > - compilation level (C1 or C2 compiled) > > This information is helpful when profiling and tuning the VM for > a given application and also for profiling code that uses > JNI heavily. > > Using the proposed StackWalker class, implementing a new API > that returns more information on frames is possible > as a thin wrapper over the StackWalker API [2]. > This also improves the maintainability as the code used > in this API is used in multiple places and is therefore > also better tested than the previous implementation, see > [1] for the implementation. > > The following describes the proposed API: > > ```cpp > void AsyncGetCallTrace2(asgct2::CallTrace *trace, jint depth, void* ucontext); > ``` > > The structure of `CallTrace` is the same as the original > `ASGCT_CallTrace` with the same error codes encoded in <= 0 > values of `num_frames`. > > ```cpp > typedef struct { > JNIEnv *env_id; // Env where trace was recorded > jint num_frames; // number of frames in this trace > CallFrame *frames; // frames > void* frame_info; // more information on frames > } CallTrace; > ``` > > The only difference is that the `frames` array also contains > information on C frames and the field `frame_info`. > The `frame_info` is currently null and can later be used > for extended information on each frame, being an array with > an element for each frame. But the type of the > elements in this array is implementation specific. > This akin to `compile_info` field in JVMTI's CompiledMethodLoad > [3] and used for extending the information returned by the > API later. > > Protoype > ------------ > > Currently `CallFrame` is implemented in the prototype [4] as > > ```cpp > typedef struct { > void *machine_pc; // program counter, for C and native frames > (frames > of native methods) > uint8_t type; // frame type (single byte) > uint8_t comp_level; // highest compilation level of a method related > to > a Java frame > // information from original CallFrame > jint bci; // bci for Java frames > jmethodID method_id; // method ID for Java frames > } CallFrame; > ``` > > The `FrameTypeId` is based on the frame type in JFRStackFrame: > > ```cpp > enum FrameTypeId { > FRAME_INTERPRETED = 0, > FRAME_JIT = 1, // JIT compiled > FRAME_INLINE = 2, // inlined JITed methods > FRAME_NATIVE = 3, // native wrapper to call C methods from Java > FRAME_CPP = 4 // c/c++/... frames, stub frames have CompLevel_all > }; > ``` > > The `comp_level` states the compilation level of the method related to the > frame > with higher numbers representing "more" compilation. `0` is defined as > interpreted. It is modeled after the `CompLevel` enum in > `compiler/compilerDefinitions`: > > ```cpp > // Enumeration to distinguish tiers of compilation > enum CompLevel { > // ... > CompLevel_none = 0, // Interpreter > CompLevel_simple = 1, // C1 > CompLevel_limited_profile = 2, // C1, invocation & backedge > counters > CompLevel_full_profile = 3, // C1, invocation & backedge > counters + > mdo > CompLevel_full_optimization = 4 // C2 or JVMCI > }; > ``` > > The traces produced by this prototype are fairly large > (each frame requires 24 is instead of 16 bytes on 64 bit systems) and some > data > is > duplicated. > The reason for this is that it simplified the extension of async-profiler > for the prototype, as it only extends the data structures of > the original AsyncGetCallTrace API without changing the original fields. > > Proposal > ------------ > > But packing the information and reducing duplication is of course possible > if we step away from the former constraint: > > ```cpp > enum FrameTypeId { > FRAME_JAVA = 1, // JIT compiled and interpreted > FRAME_JAVA_INLINED = 2, // inlined JIT compiled > FRAME_NATIVE = 3, // native wrapper to call C methods from Java > FRAME_STUB = 4, // VM generated stubs > FRAME_CPP = 5 // C/C++/... frames > }; > > typedef struct { > uint8_t type; // frame type > uint8_t comp_level; > uint16_t bci; // 0 < bci < 65536 > jmethodID method_id; > } JavaFrame; // used for FRAME_JAVA and FRAME_JAVA_INLINED > > typedef struct { > FrameTypeId type; // single byte type > void *machine_pc; > } NonJavaFrame; // used for FRAME_NATIVE, FRAME_STUB and FRAME_CPP > > typedef union { > FrameTypeId type; // to distinguish between JavaFrame and NonJavaFrame > JavaFrame java_frame; > NonJavaFrame non_java_frame; > } CallFrame; > ``` > > This uses the same amount of space per frame (16 bytes) as the original but > encodes far more information. > > Best regards > Johannes > > [1] > https://github.com/parttimenerd/jdk/blob/parttimenerd_asgct2/src/hotspot/share/jfr/recorder/stacktrace/stackWalker.hpp > > [2] > https://github.com/parttimenerd/jdk/blob/parttimenerd_asgct2/src/hotspot/share/prims/asgct2.cpp**** > > [3] > https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html#CompiledMethodLoad > > [4] > https://github.com/parttimenerd/jdk/blob/parttimenerd_asgct2/src/hotspot/share/prims/asgct2.hpp