On Thu, 30 Apr 2026 02:35:35 GMT, David Holmes <[email protected]> wrote:
>> Please review this change to fix a problem that can rise if JVM TI >> suspension is allowed when a thread is executing in a JNI "critical" region. >> The gory details are in the first comment so that the PR emails are shorter >> >> A new test is introduced to check that we cannot suspend in a critical region >> >> Other testing: >> - Tiers 1-5 on all platforms >> >> The key insights into this solution are attributed to @pchilano. Everything >> simpler I tried was buggy and led me back to Patricio's suggested changes to >> the operation filtering. The actual details of this and any remaining bugs >> in it are all my own. >> >> Thanks. >> >> --------- >> - [x] I confirm that I make this contribution in accordance with the >> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai). > > David Holmes has updated the pull request incrementally with one additional > commit since the last revision: > > Fix typo Changes look good. Only found one issue I described below. src/hotspot/share/runtime/handshake.cpp line 718: > 716: > 717: HandshakeOperation* op = get_op(); > 718: assert(op != nullptr, "Must have an op"); There is a subtle race now where we can hit this assert. The sequence of events go as follows: 1 - `Thread1` adds handshake operation to `TargetThread's` queue. Sees `TargetThread's` as handshake safe and proceeds to `claim_handshake()`. 2 - Before `Thread1` calls `claim_handshake`, `TargetThread’s` processes the operation and removes it from the queue. 3 - `TargetThread’s` calls `GetPrimitiveArrayCritical`. Before incrementing `_jni_deferred_suspension_count`, `Thread2` adds suspend handshake to `TargetThread’s` queue. 4 - `Thread1` calls `claim_handshake` and sees that there is a pending operation that is enabled. 5 - `TargetThread’s` increments `_jni_deferred_suspension_count` and continues execution in critical native code. 6 - `Thread1` calls `can_process_handshake()` and sees `TargetThread` in native. Calls `get_op` but the only operation in the queue is not enabled. I created a test with a couple of artificial delays in the VM which triggers this assert: https://github.com/openjdk/jdk/compare/pr/30936...pchilano:jdk:8373839-jni-crit-suspend-stresstest One way of fixing this could be to not allow others threads besides the requester/target to process the suspend handshake. That means `HandshakeClosure::is_enabled` would need to also take `_requester`. I’m also thinking if we could move `can_process_handshake` to `claim_handshake` after grabbing the monitor and before checking the queue, i.e. we first check if the thread is in a handshake safe state. ------------- PR Review: https://git.openjdk.org/jdk/pull/30936#pullrequestreview-4222819289 PR Review Comment: https://git.openjdk.org/jdk/pull/30936#discussion_r3183928461
