On Wed, 1 May 2024 21:32:46 GMT, Chris Plummer <cjplum...@openjdk.org> wrote:

> This PR adds ranked monitor support to the debug agent. The debug agent has a 
> large number of monitors, and it's really hard to know which order to grab 
> them in, and for that matter which monitors might already be held at any 
> given moment. By imposing a rank on each monitor, we can check to make sure 
> they are always grabbed in the order of their rank. Having this in place when 
> I was working on [JDK-8324868](https://bugs.openjdk.org/browse/JDK-8324868) 
> would have made it much easier to detect a deadlock that was occuring, and 
> the reason for it. That's what motivated me to do this work
> 
> There were 2 or 3 minor rank issues discovered as a result of these changes. 
> I also learned a lot about some of the more ugly details of the locking 
> support in the process.
> 
> Tested with the following on all supported platforms and with virtual threads:
> 
> com/sun/jdi
> vmTestbase/nsk/jdi
> vmTestbase/nsk/jdb
> vmTestbase/nsk/jdwp 
> 
> Still need to run tier2 and tier5.
> 
> Details of the changes follow in the first comment.

I was able to minimize changes for users of RawMonitor by keeping the same 
client wrapper APIs (eg. debugMonitorEnter), and for the most part passing in 
the same argument (the monitor to enter). The only change is that the client is 
passing a new type called DebugRawMonitor* instead of a jrawMonitorID. The 
DebugRawMonitor encapsulates the jrawMonitorID and contains other monitor info 
that needs to be tracked (like the rank and current owner).

Ranks were determined largely by trial and error by running tests and fixing 
the rank violations asserts as I hit them. In the enum that contains all the 
rankings, I explain the rational for each monitor's rank.

For the moment I have included a flag (rankedmonitors) to control whether or 
not ranked monitors should be used. This is just a safety net in case someone 
runs into an issue. The flag is not documented and will likely be removed at 
some point.

The ranked monitor supported ended up needing a RawMonitor itself 
(dbgRawMonitor_lock). This RawMonitor is therefore not part of the ranked 
monitor support. It could probably use a better name. Suggestions are welcomed.

There are a few issues I fixed while working on the ranked monitor support:

During debug agent initialization we need to call util_initialize() before 
commonRef_initialize() because the later creates a monitor, and that depends on 
util_initialize() having already been called.

getLocks() in threadControl.c wasn't grabbing locks in the right order. This 
could have lead to a deadlock, but seems the structure of how locks are used 
prevented one from happening.

In invoker_completeInvokeRequest() I had to add grabbing the stepLock in order 
to maintain proper lock order when getLocks() was eventually called (which also 
grabbed stepLock). Other than triggering a rank violation assert, this doesn't 
seem to have caused any real issue, but did leave the potential for a deadlock.

In handleInterrupt() I noticed that a local JNI ref was not being freed. I 
noticed because there were other places where I had added calls to 
threadControl_currentThread() that resulted in a noticeable JNI ref leak. When 
I fixed those, I fixed the one in handleInterrupt() too even tough it was not 
causing any problems.

The ranked monitor APIs needed some extra safeguards during VM exit that are 
not needed for the unranked monitor APIs. This is because the unranked APIs 
didn't have a problem with the current thread being NULL, but the ranked APIs 
require a valid current thread. That is why you see the following check:


    if (gdata->vmDead && current_thread == NULL) {
        return;
    }

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19044#issuecomment-2089176402

Reply via email to