On Wed, 8 Apr 2026 05:04:03 GMT, Serguei Spitsyn <[email protected]> wrote:
>> This change fixes a long standing performance issue related to the debugger >> single stepping that is using JVMTI `FramePop` events as a part of step over >> handling. The performance issue is that the target thread continues its >> execution in very slow `interp-only` mode in a context of frame marked for >> `FramePop` notification with the JVMTI `NotifyFramePop`. It includes other >> method calls recursively upon a return from the frame. >> >> This fix is to avoid enforcing the `interp-only` execution mode for threads >> when `FramePop` events are enabled with the JVMTI >> `SetEventNotificationMode()`. Instead, the target frame has been deoptimized >> and kept interpreted by disabling `OSR` optimization by the function >> `InterpreterRuntime::frequency_counter_overflow_inner()`. (Big thanks to >> @fisk for this suggestion!) Additionally, some tweaks are applied in several >> places where the `java_thread->is_interp_only_mode()` is checked. >> The other details will be provided in the first PR request comment. >> >> Testing: >> - test `serviceability/jvmti/vthread/ThreadStateTest` was updated to >> provide some extra test coverage >> - submitted mach5 tiers 1-6 >> >> - [x] I confirm that I make this contribution in accordance with the >> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai). > > Serguei Spitsyn has updated the pull request incrementally with one > additional commit since the last revision: > > review: (1) remove unneeded assert, (2) simplify one statement in > JvmtiEnvBase::set_frame_pop Hi Serguei, nice improvement! src/hotspot/cpu/x86/interp_masm_x86.cpp line 1628: > 1626: push(state); > 1627: call_VM(noreg, > 1628: CAST_FROM_FN_PTR(address, > InterpreterRuntime::post_method_exit)); I see `JvmtiExport::can_post_interpreter_events()` will be set for agents loaded at start-up. So even if we don’t enable notifications for method exit or frame pop events we will still make this VM call on method exit. Maybe we could keep the interpreter only mode check for method exit cases, and also add a new check for frame pop events? We could have something similar to `check_and_handle_earlyret`, where we check for the `JvmtiThreadState` first and then another field which will be set when there is at least one frame pop request. Given that there could be multiple environments, a simple solution could be to set a boolean in the `JvmtiThreadState` when we set a frame pop notification request, and then clearing it in `post_method_exit` if there are no requests left. src/hotspot/share/prims/jvmtiEnvBase.cpp line 1378: > 1376: if (state->is_virtual() && (thread == nullptr || > !thread->is_vthread_mounted())) { // unmounted virtual thread > 1377: // set pending frame deoptimization to process at mount transition > 1378: state->vthread_pending_deopts()->append(frame_number); I’ve been trying to understand why we need to save the deopt request and process it later for the unmounted case, instead of just deoptimizing the frame in the heap as we do below. So I tested that small change and hit this [assertion](https://github.com/openjdk/jdk/blob/44313a453702d968d24c494de84c4a0a5892a18b/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L1907) while running test `ThreadStateTest.java`. Is that what you run into that led to this solution? This can happen in the mounted case below too though. The problem is that the deoptimized frame could be in a `stackChunk` that was freezed in the fast path, but the `stackChunk` code doesn't expect deoptimized frames in that case. In this failing test case, we are later thawing the deoptimized frame along with all the other ones in the thaw fast path (full chunk thaw), and then without getting to return to the deoptimized frame we freeze again, hitting this assertion. One way of fixing this could be to add a new method to the `stackChunk` class to mark it as having mixed frames: inline void stackChunkOopDesc::force_slow_path() { #if INCLUDE_ZGC || INCLUDE_SHENANDOAHGC if (UseZGC || UseShenandoahGC) { relativize_derived_pointers_concurrently(); } #endif set_flag(FLAG_HAS_INTERPRETED_FRAMES, true); } And we could do: fr = jvf->stack_chunk()->derelativize(fr); fr.deoptimize(nullptr); jvf->stack_chunk()->force_slow_path(); ------------- PR Review: https://git.openjdk.org/jdk/pull/28407#pullrequestreview-4198812921 PR Review Comment: https://git.openjdk.org/jdk/pull/28407#discussion_r3162617769 PR Review Comment: https://git.openjdk.org/jdk/pull/28407#discussion_r3162551122
