Hi Gary,

I'm curious as to what event is triggering this. When the JVMTI event comes in, you eventually end up in this code in enqueueCommand():

    if (wait) {
        debugMonitorEnter(commandCompleteLock);
        while (!command->done) {
            log_debugee_location("enqueueCommand(): HelperCommand wait", NULL, NULL, 0);
            debugMonitorWait(commandCompleteLock);
        }
        freeCommand(command);
        debugMonitorExit(commandCompleteLock);
    }

"wait" is normally true for things like breakpoint and singlestep events that will suspend the thread that the event came in on (SUSPEND_ALL or SUSPEND_THREAD). So this will cause the thread that the JVMTI event was delivered on to block, preventing it from exiting. If the policy is SUSPEND_NONE, then it will not block and could exit.

In another thread you have commandloop() which dequeues the command, "handles" it (more on this below), and then if the command has the SUSPEND_ALL policy, will block until the debugger resumes all threads. Before doing this it will have already suspended threads according to the policy.

The code you are executing in when the problem happens is in the command handling part.

  commandLoop() ->
    handleCommand() ->
      handleReportEventCompositeCommand() ->
        suspendWithInvokeEnabled()

static void
suspendWithInvokeEnabled(jbyte policy, jthread thread)
{
    invoker_enableInvokeRequests(thread);

    if (policy == JDWP_SUSPEND_POLICY(ALL)) {
        (void)threadControl_suspendAll();
    } else {
        (void)threadControl_suspendThread(thread, JNI_FALSE);
    }
}

So one question is what is the suspend policy when this happens? If the thread has managed to exit, it must not be SUSPEND_ALL or SUSPEND_THREAD. Otherwise the thread would be blocked in enqueueCommand() as described above. However, if we are in suspendWithInvokeEnabled() we know it must be SUSPEND_ALL or SUSPEND_THREAD because if you look in handleReportEventCompositeCommand() you'll see it does not call suspendWithInvokeEnabled() if the policy is SUSPEND_NONE.

This all seems to indicate that we should never be at the point of calling suspendWithInvokeEnabled() when the thread passed to it has already exited. It should be blocked in enqueueCommand(). So we're missing something here in the understanding of the problem. There might be a bug elsewhere, and your proposed fix is just masking it.

Chris

On 5/24/19 2:01 PM, Chris Plummer wrote:
Hi Gary,

On 5/24/19 5:17 AM, Gary Adams wrote:
I have not tracked down the specific root cause of this failure, yet.

It appears that the suspend is being attempted before the thread has been
recorded in threadControl.
I don't think this is possible. When the JVMTI event is received by the debug agent, that's when threadControl adds the thread. We are well beyond that point. The debug agent has processed the JVMTI event and created and queued a ReportEventCompositeCommand (recc) to pass the event on the the debugger (the queue is processed asynchronously by another thread). You are in the process of processing this recc, which involves suspending all threads before sending the event to the debugger. If the thread in the recc is invalid (no longer known to threadControl), then the only way I can see that happening is if the thread has terminated. ASFAIK, there is nothing preventing this from happening.

Adding diagnostic messages is tricky because it changes the timing.
Here's a minimal probe to track threadControl addNode and clearThread
transactions. See attached log.txt.

diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c
--- a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c
+++ b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c
@@ -491,7 +491,9 @@
 static void
 suspendWithInvokeEnabled(jbyte policy, jthread thread)
 {
+  /*    if (threadControl_getInvokeRequest(thread) != NULL) { */
     invoker_enableInvokeRequests(thread);
+  /*    } */
 
     if (policy == JDWP_SUSPEND_POLICY(ALL)) {
         (void)threadControl_suspendAll();
diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c b/src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c
--- a/src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c
+++ b/src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c
@@ -285,6 +285,7 @@
 static void
 addNode(ThreadList *list, ThreadNode *node)
 {
+  printf ("addNode %p \n", node->thread);
     node->next = NULL;
     node->prev = NULL;
     node->list = NULL;
@@ -362,6 +363,7 @@
 static void
 clearThread(JNIEnv *env, ThreadNode *node)
 {
+  printf("clearThread %p\n", node->thread);
     if (node->pendingStop != NULL) {
         tossGlobalRef(env, &(node->pendingStop));
     }
@@ -1646,6 +1648,8 @@
     node = findThread(&runningThreads, thread);
     if (node != NULL) {
          request = &node->currentInvoke;
+    } else {
+      printf ("threadControl_getInvokeRequest %p\n", thread);
     }
 
     debugMonitorExit(threadLock);


The AGENT_ERROR_INVALID_THREAD is reported when invoker_enableInvokeRequest
does not find the thread. I added the print out in threadControl_getInvokeRequest
when the thread is not found.
You are printing the oop* instead of the oop. That's fine for the node->thread references in addNode and clearNode, since they should match up, but the "thread" reference in threadControl_getInvokeRequest() is probably a localref, so will not match up with anything printed by addNode or clearNode. You probably need to print the oop and hope there is no intervening gc.


The workaround bypasses the error and falls through to the threadControl CommonSuspend
path where runningThreads is complimented with an otherThreads mechanism to ensure
threads that are not between start and end events will be properly resumed later on.
This fix is probably fine, but I need to think about it some more. Would be best to first confirm what's going on though.

thanks,

Chris


On 5/23/19, 1:23 PM, Chris Plummer wrote:
Hi Gary,

So a JVMTI event came in on a valid thread, got processed by the Debug Agent and enqueued to be sent to the Debugger. However, before it was actually sent, the thread became invalid. Am I understanding this issue correctly?

thanks,

Chris

On 5/23/19 2:59 AM, [email protected] wrote:
This proposed workaround ensures that the delay between a suspend request
passing through the jdwp command queue will not fail due to a no longer
running thread.

  Webrev: http://cr.openjdk.java.net/~gadams/8218701/webrev/
  Issue: https://bugs.openjdk.java.net/browse/JDK-8218701






Reply via email to