On Thu, 2 May 2024 02:40:36 GMT, Alex Menkov <amen...@openjdk.org> wrote:

>> This PR adds ranked monitor support to the debug agent. The debug agent has 
>> a large number of monitors, and it's really hard to know which order to grab 
>> them in, and for that matter which monitors might already be held at any 
>> given moment. By imposing a rank on each monitor, we can check to make sure 
>> they are always grabbed in the order of their rank. Having this in place 
>> when I was working on 
>> [JDK-8324868](https://bugs.openjdk.org/browse/JDK-8324868) would have made 
>> it much easier to detect a deadlock that was occuring, and the reason for 
>> it. That's what motivated me to do this work
>> 
>> There were 2 or 3 minor rank issues discovered as a result of these changes. 
>> I also learned a lot about some of the more ugly details of the locking 
>> support in the process.
>> 
>> Tested with the following on all supported platforms and with virtual 
>> threads:
>> 
>> com/sun/jdi
>> vmTestbase/nsk/jdi
>> vmTestbase/nsk/jdb
>> vmTestbase/nsk/jdwp 
>> 
>> Still need to run tier2 and tier5.
>> 
>> Details of the changes follow in the first comment.
>
> src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c line 656:
> 
>> 654:     commonRef_lock();
>> 655:     if (gdata->rankedMonitors) {
>> 656:         dbgRawMonitor_lock();
> 
> What is this call for? I think dbgRawMonitor_lock/dbgRawMonitor_unlock should 
> not be used outside of util.c (I'd remove them from util.h)

After calling getLocks(), there may still be attempts to enter locks. The locks 
should already be held. I filed 
[JDK-8330193](https://bugs.openjdk.org/browse/JDK-8330193) to assert that. 
However, even when entering an already entered lock, we still need to grab 
dbgRawMonitor. I found that out the hard way when there were deadlocks on 
dbgRawMonitor because it was not entered it here.

> src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1204:
> 
>> 1202:     // Need to lock during initialization so verifyMonitorRank() can 
>> be guaranteed that
>> 1203:     // if the monitor field is set, then the monitor if fully 
>> initialized. More specifically,
>> 1204:     // that the rank field has been set.
> 
> rank for all monitors can be set in `util_initialize()`:
> 
> for (i = 0; i < NUM_DEBUG_RAW_MONITORS; i++) {
>   dbg_monitors[i]->rank = i;
> }

Ok.

> src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1231:
> 
>> 1229:     }
>> 1230: 
>> 1231:     dbg_monitor->monitor = NULL;
> 
> I think it would be better to protect this with dbgRawMonitor

I don't see how that helps. Access to the field is not protected.

> src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1243:
> 
>> 1241:     error = JVMTI_FUNC_PTR(gdata->jvmti,GetThreadInfo)
>> 1242:             (gdata->jvmti, thread, &info);
>> 1243:     return info.name;
> 
> Need to delete JNI reference info.thread_group and info.jthreadGroup (if not 
> NULL)

Did you mean info.context_class_loader, not info.jthreadGroup?

It looks like elsewhere when we call GetThreadInfo it is bracketed with the 
WITH_LOCAL_REFS macro to automatically push/pop a local refs frame, although it 
is only setting the size to 1, which seems like a bug. I'll do the same here.

> src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1262:
> 
>> 1260:     for (i = 0; i < NUM_DEBUG_RAW_MONITORS; i++) {
>> 1261:         DebugRawMonitor* dbg_monitor = &dbg_monitors[i];
>> 1262:         if (dbg_monitor == NULL) {
> 
> Should be `dbg_monitor->monitor`

ok

> src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1279:
> 
>> 1277:         return; // This assert is not reliable if the VM is exiting
>> 1278:     }
>> 1279:     jthread current_thread = threadControl_currentThread();
> 
> Looks like all callers of the `assertIsCurrentThread()` have reference to the 
> current thread. Maybe pass it as argument?

Ok. There were a few changes w.r.t. when this API is called, as was when the 
callers have access to the current thread, so I that's why I ended up not 
passing it in.

> src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1281:
> 
>> 1279:     jthread current_thread = threadControl_currentThread();
>> 1280:     if (!isSameObject(env, thread, current_thread)) {
>> 1281:         tty_message("ERROR: Threads not the same: %p %p\n", thread, 
>> current_thread);
> 
> Not sure the addresses provide useful information. Maybe print thread names?

ok

> src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1289:
> 
>> 1287: 
>> 1288: static void
>> 1289: verifyMonitorRank(JNIEnv *env, DebugRawMonitorRank rank, jthread 
>> thread)
> 
> please add a comment that the function should be called under dbgRawMonitor 
> lock (also for other functions which assume this)

ok. I was also thinking of setting a static that contains the jthread of the 
thread that holds the lock, and we can assert it is held.

> src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1294:
> 
>> 1292:     // has a higher rank than the monitor we are about to enter.
>> 1293:     DebugRawMonitorRank i;
>> 1294:     for (i = 0; i < NUM_DEBUG_RAW_MONITORS; i++) {
> 
> No need to check monitors from the beginning.
> Should be enough to check starting from `min(rank, 
> FIRST_LEAF_DEBUG_RAW_MONITOR)`

I originally started at `rank`, but then when I added the 2nd of the two checks 
below I switched to 0, but your min suggestion should be optimal.

> src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1470:
> 
>> 1468: 
>> 1469:     // Assert that the current thread owns this monitor.
>> 1470:     assertIsCurrentThread(getEnv(), dbg_monitor->ownerThread);
> 
> reading ownerThread (and other) fields without dbgRawMonitor is not MT-safe 
> (it's not atomic)

It is safe if you are the owner of the monitor.  So what that means is that 
this assert could potentially crash, but only if you are not the owner, which 
means if it didn't crash then it would have asserted. However, even when 
asserts are disabled, isSameObject() is still executed, and that's were the 
crash would happen. assertIsCurrentThread() really should be a no-op if asserts 
are not enabled. But even if that change is not made, it still would only crash 
if something bad was going on since the current thread should always own the 
monitor.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588360985
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588367723
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588371240
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588390369
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588391447
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588394746
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588397148
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588400113
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588407481
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588424198

Reply via email to