On 4/14/20 10:56 PM, David Holmes wrote:
Hi Chris,

On 15/04/2020 1:37 pm, Chris Plummer wrote:
Hello,

[Sorry this email got kind of long. To cut to the chase, I want to know if there are times where it is acceptable for SA to not be able to produce a stack trace for a thread. Details below if you are interested.]

How does the SA currently attempt to get a stacktrace for a thread? If the current mechanism has limitations then perhaps that will be addressed now that we have per-thread handshakes? With handshakes the target thread will always be brought to a state where the stack is walkable. That said I thought all existing mechanisms used a safepoint VM op to get a stacktrace for a different thread.
Hi David,

I don't know the details, but I think there is no safepointing involved. Consider that you can also get thread stack traces from a JVM core file, which could be produced at any given moment in the JVM's execution. Note SA has lots of safeguards around this. It has ways of verifying if something points to what it is really suppose to point to. For example if SA thinks something is a pointer to a heap object, it verifies that the object header contains a pointer to something that is known to be an Klass, which it can determine by looking at the vtable (and then cross referencing with symbolic info it pulls from the executable). I'm just not too sure to what extent this applies to thread stack walking. As I mentioned below, thread.getLastJavaVFrameDbg() sometimes returns null, thus no stack trace.

  /** This should only be used by a debugger. Uses the current frame
      guess to attempt to get the topmost JavaVFrame.
      (getLastJavaVFrame, as a port of the VM's routine, assumes the
      VM is at a safepoint.) */
  public JavaVFrame getLastJavaVFrameDbg() {

So it seems that since we are not at a safepoint, and there's a lot of "guessing" in the implementation. I gather from this the thread won't always be in a state where we can determine the last vframe, and therefore not in a state where SA can produce a stack trace.

thanks,

Chris

Cheers,
David
-----


We have a number of SA tests that request a thread dump, look for a specific symbol in the thread dump, and fail if the symbol is not found. Normally what they are looking for is LingeredApp.main() which should be in the stack trace of the main thread. ClhsdbJstack.java is one such test. It expects the main thread to look like:

"main" #1 prio=5 tid=0x000001d6301de800 nid=0x3258 waiting on condition [0x0000007fc1dff000]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
    JavaThread state: _thread_blocked
  - java.lang.Thread.sleep(long) @bci=0, pc=0x000001d640d0f417, Method*=0x00000008000e8898 (Interpreted frame)   - jdk.test.lib.apps.LingeredApp.main(java.lang.String[]) @bci=54, line=499, pc=0x000001d640d0a1b3, Method*=0x000001d658673ba0 (Interpreted frame)

But sometimes all it gets is:

"main" #1 prio=5 tid=0x00007fab2e802000 nid=0x2303 runnable [0x0000000000000000]
    java.lang.Thread.State: RUNNABLE
    JavaThread state: _thread_in_java

This results in the test failing because it does not find LingeredApp.main in the output. The state for the passing case is always _thread_blocked and for the failing case _thread_in_java. This has been reported by the following CR:

[1] JDK-8242411 - serviceability/sa/ClhsdbCDSJstackPrintAll.java fails with Test ERROR java.lang.RuntimeException: 'LingeredApp.main' missing from stdout/stderr

After starting, LingeredApp.main sits in a loop:

             while (Files.exists(path)) {
                 // Touch the lock to indicate our readiness
                 setLastModified(theLockFileName, epoch());
                 Thread.sleep(spinDelay);
             }

So it's basically waiting for the lock file to be deleted. By default spinDelay is 1 second. I suspected the issue I was seeing was due to asking for the thread dump when not blocked on the sleep(), so I changed spingDelay to 1ms. That made this missing stack trace issue much easier to reproduce, plus a several other bugs that are filed, but normally rarely reproduce:

[2] JDK-8231634 - SA stack walking fails with "illegal bci"
[3] JDK-8240781 - serviceability/sa/ClhsdbJdis.java fails with "java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1" [4] JDK-8211923 - [Testbug] serviceability/sa/ClhsdbFindPC.java ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1 [5] JDK-8204994 - SA might fail to attach to process with "Windbg Error: WaitForEvent failed"

The "illegal bci" failure I haven't looked into much, but is likely an SA bug due to SA having issues with (and probably making assumptions about) the state of the stack.

The two ArrayIndexOutOfBoundsException bugs are dups. They fail because the stack trace of the main thread is missing, and some String splitting logic in the test therefore fails and produces the ArrayIndexOutOfBoundsException.

I'm not sure about the "WaitForEvent failed". It could be unrelated.

I can probably make these all go away buy having Lingered.main() spawn a helper thread to do the above loop in. That would keep the main thread stable (blocked on a Thread.join). However, it also would hide some issues(like the "illegal bci" failure).

The main reason for the email is to ask what are the expectations of SA's ability to dump a thread's stack trace. Is it expected that sometimes the thread will be in a state that prevents dumping the stack? I know for example that the reason we sometimes don't see a stack is because thread.getLastJavaVFrameDbg() is returning null. Basically SA throws up its hands and says "I can't do it"? Is that acceptable in some cases.

thanks,

Chris

[1] https://bugs.openjdk.java.net/browse/JDK-8242411
[2] https://bugs.openjdk.java.net/browse/JDK-8231634
[3] https://bugs.openjdk.java.net/browse/JDK-8240781
[4] https://bugs.openjdk.java.net/browse/JDK-8211923
[5] https://bugs.openjdk.java.net/browse/JDK-8204994


Reply via email to