On Fri, 26 Mar 2021 13:21:43 GMT, Robbin Ehn <r...@openjdk.org> wrote:
>> A suspend request is done by handshaking thread target thread(s). When >> executing the handshake operation we know the target mutator thread is in a >> dormant state (as in safepoint safe state). We have a guarantee that it will >> check it's poll before leaving the dormant state. To stop the thread from >> leaving the the dormant state we install a second asynchronous handshake to >> be executed by the targeted thread. The asynchronous handshake will wait on >> a monitor while the thread is suspended. The target thread cannot not leave >> the dormant state without a resume request. >> >> Per thread suspend requests are naturally serialized by the per thread >> HandshakeState lock (we can only execute one handshake at a time per thread). >> Instead of having a separate lock we use this to our advantage and use >> HandshakeState lock for serializing access to the suspend flag and for >> wait/notify. >> >> Suspend: >> Requesting thread -> synchronous handshake -> target thread >> Inside synchronus handshake (HandshakeState lock is locked while >> executing any handshake): >> - Set suspended flag >> - Install asynchronous handshake >> >> Target thread -> tries to leave dormant state -> Executes handshakes >> Target only executes asynchronous handshake: >> - While suspended >> - Go to blocked >> - Wait on HandshakeState lock >> >> Resume: >> Resuming thread: >> - Lock HandshakeState lock >> - Clear suspended flag >> - Notify HandshakeState lock >> - Unlock HandshakeState lock >> >> The "suspend requested" flag is an optimization, without it a dormant thread >> could be suspended and resumed many times and each would add a new >> asynchronous handshake. Suspend requested flag means there is already an >> asynchronous suspend handshake in queue which can be re-used, only the >> suspend flag needs to be set. >> >> ---- >> Some code can be simplified or done in a smarter way but I refrained from >> doing such changes instead tried to keep existing code as is as far as >> possible. This concerns especially raw monitors. >> >> ---- >> Regarding the changed test, the documentation says: >> "If the calling thread is specified in the request_list array, this function >> will not return until some other thread resumes it." >> >> But the code: >> LOG("suspendTestedThreads: before JVMTI SuspendThreadList"); >> err = jvmti->SuspendThreadList(threads_count, threads, results); >> ... >> // Allow the Main thread to inspect the result of tested threads >> suspension >> agent_unlock(jni); >> >> The thread will never return from SuspendThreadList until resumed, so it >> cannot unlock with agent_unlock(). >> Thus main thread is stuck forever on: >> // Block until the suspender thread competes the tested threads suspension >> agent_lock(jni); >> >> And never checks and resumes the threads. So I removed that lock instead >> just sleep and check until all thread have the expected suspended state. >> >> ---- >> >> This version already contains updates after pre-review comments from >> @dcubed-ojdk, @pchilano, @coleenp. >> (Pre-review comments here: >> https://github.com/openjdk/jdk/pull/2625) >> >> ---- >> Testing t1-t8, nsk_jdi/nsk_jvmti/jdk_jdi/tck, KS24, RunThese and >> combinations like running with -XX:ZCollectionInterval=0.01 - >> XX:ZFragmentationLimit=0. >> Running above some of above concurrently (load ~240), slow debug, >> etc... > > Robbin Ehn has updated the pull request with a new target base due to a merge > or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into SuspendInHandshake > - 8257831: Suspend with handshake (review baseline) Hi Robbin, This is a great piece of work with a lot of code simplifications. Unfortunately some new complexities that need to be hidden by appropriate abstractions. Lots of comments, queries and suggestions below. Thanks, David src/hotspot/share/prims/jvmtiEnv.cpp line 946: > 944: // java_thread - pre-checked > 945: jvmtiError > 946: JvmtiEnv::SuspendThread(JavaThread* java_thread) { The comment above this still states: // Threads_lock NOT held, java_thread not protected by lock but the java_thread is protected by a TLH so we should document that so we know it is always safe to refer to java_thread below. src/hotspot/share/prims/jvmtiEnv.cpp line 1009: > 1007: if (self_index >= 0) { > 1008: if (!JvmtiSuspendControl::suspend(current)) { > 1009: results[self_index] = JVMTI_ERROR_THREAD_NOT_ALIVE; Surely impossible when dealing with the current thread! src/hotspot/share/prims/jvmtiImpl.cpp line 767: > 765: // > 766: > 767: bool JvmtiSuspendControl::suspend(JavaThread *java_thread) { Future RFE: JvmtiSuspendControl is no longer needed src/hotspot/share/prims/jvmtiRawMonitor.cpp line 319: > 317: jt = self->as_Java_thread(); > 318: while (true) { > 319: // To pause suspend requests while in native we must block > handshakes. The earlier comment says we must be _thread_blocked in this code, so we won't be "native". But that is not a _thread_blocked that is managed by a thread-state transition so that is why this code has to "manually" manage the handshake state. src/hotspot/share/prims/jvmtiRawMonitor.cpp line 428: > 426: jt->set_thread_state_fence(_thread_in_native_trans); > 427: SafepointMechanism::process_if_requested(jt); > 428: if (jt->is_interrupted(true)) { A thread must be _thread_in_vm to safely query is_interrupted() as it accesses the threadObj. src/hotspot/share/runtime/handshake.cpp line 415: > 413: // Adds are done lock free and so is arming. > 414: // Calling this method with lock held is considered an error. > 415: assert(!_lock.owned_by_self(), "Lock should not be held"); If adds are still lock-free then why was this assertion removed? src/hotspot/share/runtime/handshake.cpp line 463: > 461: ThreadInVMForHandshake tivm(_handshakee); > 462: { > 463: ttyLocker::break_tty_lock_for_safepoint(os::current_thread_id()); Why is this needed when it is inside ThreadInVMForHandshake constructor ?? src/hotspot/share/runtime/handshake.cpp line 633: > 631: bool HandshakeState::suspend_request_pending() { > 632: assert(JavaThread::current()->thread_state() != _thread_blocked, > "should not be in a blocked state"); > 633: assert(JavaThread::current()->thread_state() != _thread_in_native, > "should not be in native"); Not clear why we care about the state of the current thread here ?? src/hotspot/share/runtime/handshake.cpp line 677: > 675: } else { > 676: // Target is going to wake up and leave suspension. > 677: // Let's just stop the thread from doing that. IIUC this would be the case where the target was hit with a suspend request but has not yet processed the actual suspension handshake op, then a resume comes in so suspended is no longer true, and then another suspend request is made (this one) which simply turns the suspended flag back on - is that right? Overall I'm finding it very hard to see what the two suspend state flags actually signify. I'd like to see that written up somewhere. src/hotspot/share/runtime/handshake.hpp line 139: > 137: Thread* active_handshaker() const { return _active_handshaker; } > 138: > 139: // Suspend/resume support This would be a good place to describe the operation of these two state fields src/hotspot/share/runtime/handshake.hpp line 148: > 146: // Called from the async handshake (the trap) > 147: // to stop a thread from continuing executing when suspended. > 148: void suspend_in_handshake(); I'm finding the terminology very confusing here: handshake_suspend versus suspend_in_handshake ? It isn't at all clear which method is used in which part of the protocol. The same goes for the different closures. src/hotspot/share/runtime/nonJavaThread.cpp line 326: > 324: > 325: while (watcher_thread() != NULL) { > 326: // This wait should make safepoint checks, wait without a timeout. Nit: s/checks, wait/checks and wait/ src/hotspot/share/runtime/objectMonitor.cpp line 411: > 409: OrderAccess::storestore(); > 410: for (;;) { > 411: current->set_thread_state(_thread_blocked); So IIUC because ThreadblockInVM is now responsible for handling thread suspension checks, but those checks need to be done at a lower level, we can no longer use ThreadBlockInVM in some places. That both undermines the value of ThreadBlockInVM and make this manual management code much more complex. I agree with Richard that a TBIVM that took an unlock function would make this kind of code much clearer. src/hotspot/share/runtime/sweeper.cpp line 276: > 274: > 275: ThreadBlockInVM tbivm(thread); > 276: thread->java_suspend_self(); AFAICS the only change needed in this function was to delete the java_suspend_self as it is not handled in the TBIVM. src/hotspot/share/runtime/sweeper.cpp line 276: > 274: } > 275: MutexUnlocker mu(CodeCache_lock, Mutex::_no_safepoint_check_flag); > 276: ThreadBlockInVM tbiv(JavaThread::current()); You already have the current thread. src/hotspot/share/runtime/thread.cpp line 460: > 458: delete metadata_handles(); > 459: > 460: // SR_handler uses this as a termination indicator - As noted previously we need a replacement for this as a proxy for a Thread (not JavaThread) terminating. src/hotspot/share/runtime/thread.cpp line 667: > 665: > 666: // We wait for the thread to transition to a more usable state. > 667: for (int i = 1; i <= SuspendRetryCount; i++) { You will need to obsolete SuspendRetryCount and SuspendRetryDelay. This will need a CSR request indicating why there is no deprecation step. src/hotspot/share/runtime/thread.cpp line 1442: > 1440: > 1441: // The careful dance between thread suspension and exit is handled > here: > 1442: _handshake.thread_exit(); I don't like the fact this hides the set_terminating call. In fact I think I'd rather keep this in-line rather than tucking it away inside the handshake code. This looks too much like we're informing the handshake that the thread is exiting rather then coordinating thread termination with a late suspension request. src/hotspot/share/runtime/thread.cpp line 2099: > 2097: do { > 2098: set_thread_state(_thread_blocked); > 2099: // Check if _external_suspend was set in the previous loop > iteration. Note that the comment for this method at line 1806 needs updating as it refers to a now deleted method. src/hotspot/share/runtime/thread.cpp line 2102: > 2100: if (is_external_suspend()) { > 2101: java_suspend_self(); > 2102: } It is not at all clear to me how this method should interact with thread suspension ?? src/hotspot/share/runtime/thread.hpp line 1146: > 1144: // if external|deopt suspend is present. > 1145: bool is_suspend_after_native() const { > 1146: return (_suspend_flags & (_obj_deopt JFR_ONLY(| _trace_flag))) != 0; Comment at line 1144 needs updating. src/hotspot/share/runtime/thread.inline.hpp line 194: > 192: > 193: inline void JavaThread::set_exiting() { > 194: // use release-store so the setting of _terminated is seen more quickly Pre-existing: A release doesn't push changes to main memory more quickly it only establishes ordering with previous loads and stores. Why do we need release semantics here - what state are with ordering with? What load-acquire are we pairing with? src/hotspot/share/runtime/thread.inline.hpp line 207: > 205: } > 206: > 207: inline void JavaThread::set_terminated(TerminatedTypes t) { I prefer set_terminated(arg) over the new set of methods. src/hotspot/share/runtime/vmStructs.cpp line 764: > 762: > \ > 763: /*********************************/ > \ > 764: /* JavaThread (NOTE: incomplete) */ > \ ??? src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/Thread.java line 79: > 77: } > 78: > 79: public boolean isExternalSuspend() { I assume a new SA API for thread suspension is needed here. test/hotspot/jtreg/serviceability/jvmti/SuspendWithCurrentThread/SuspendWithCurrentThread.java line 132: > 130: ThreadToSuspend.setAllThreadsReady(); > 131: > 132: while (!checkSuspendedStatus()) { The changes to this test are still quite unclear to me. Too hard to figure out what the test was originally trying to do! ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3191