On Mon, 4 Oct 2021 14:07:18 GMT, Richard Reingruber <rr...@openjdk.org> wrote:
> This change fixes the deadlock described in the JBS-bug by: > > * Releasing `handlerLock` before waiting on `threadLock` in > `blockOnDebuggerSuspend()` > > * Notifying on `threadLock` after resuming all threads in > `threadControl_reset()` > > The PR has 3 commits: > > The first commit delays the cleanup actions after leaving the loop in > `debugLoop_run()` because of a Dispose command. The delay allows to reproduce > the deadlock > running the dispose003 test with the command > > > make run-test > TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose003 > > > The second commit contains the fix described above. With it the deadlock > cannot be reproduced anymore. > > The third commit removes the diagnostic code introduced with the first commit > again. > > The fix passed our nightly regression testing: JCK and JTREG, also in Xcomp > mode with fastdebug and release builds on all platforms. src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c line 750: > 748: while (node && node->suspendCount > 0) { > 749: /* Resume requires the event handlerLock so we have to > release it */ > 750: eventHandler_unlock(); I'm not so sure this is always safe. It might be fine in the context of resetting the connection, but not during normal debug agent operations. It allows for another event to be processed when the lock is suppose to keep event processing serialized. What happens for example if we hit the `Thread.resume()` breakpoint again on a different thread? It might be best to limit doing this only when `threadControl_reset()` is currently executing (you could set a flag there), although it seems more like a band aid than a proper fix. I could imagine there might still be scenarios were releasing the lock during reset might be problematic, although probably extremely unlikely to ever be noticed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5805