On Mon, 4 May 2026 19:20:45 GMT, Patricio Chilano Mateo <[email protected]> wrote:
>> David Holmes has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Fix typo > > src/hotspot/share/runtime/handshake.cpp line 718: > >> 716: >> 717: HandshakeOperation* op = get_op(); >> 718: assert(op != nullptr, "Must have an op"); > > There is a subtle race now where we can hit this assert. The sequence of > events go as follows: > 1 - `Thread1` adds handshake operation to `TargetThread's` queue. Sees > `TargetThread's` as handshake safe and proceeds to `claim_handshake()`. > 2 - Before `Thread1` calls `claim_handshake`, `TargetThread’s` processes the > operation and removes it from the queue. > 3 - `TargetThread’s` calls `GetPrimitiveArrayCritical`. Before incrementing > `_jni_deferred_suspension_count`, `Thread2` adds suspend handshake to > `TargetThread’s` queue. > 4 - `Thread1` calls `claim_handshake` and sees that there is a pending > operation that is enabled. > 5 - `TargetThread’s` increments `_jni_deferred_suspension_count` and > continues execution in critical native code. > 6 - `Thread1` calls `can_process_handshake()` and sees `TargetThread` in > native. Calls `get_op` but the only operation in the queue is not enabled. > > I created a test with a couple of artificial delays in the VM which triggers > this assert: > https://github.com/openjdk/jdk/compare/pr/30936...pchilano:jdk:8373839-jni-crit-suspend-stresstest > > One way of fixing this could be to not allow others threads besides the > requester/target to process the suspend handshake. That means > `HandshakeClosure::is_enabled` would need to also take `_requester`. > I’m also thinking if we could move `can_process_handshake` to > `claim_handshake` after grabbing the monitor and before checking the queue, > i.e. we first check if the thread is in a handshake safe state. @pchilano It seems to me that a simple fix is to convert the assert into a `return _no_operation` if `getOp()` returns null. In your scenario `thread1` will now return from `try_process` and see that the op it submitted has completed. Meanwhile `thread` continues to wait until the suspend op is processed, once the target leaves the critical region. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/30936#discussion_r3199413902
