Hi Chris,
On 15/04/2020 1:37 pm, Chris Plummer wrote:
Hello,
[Sorry this email got kind of long. To cut to the chase, I want to know
if there are times where it is acceptable for SA to not be able to
produce a stack trace for a thread. Details below if you are interested.]
How does the SA currently attempt to get a stacktrace for a thread? If
the current mechanism has limitations then perhaps that will be
addressed now that we have per-thread handshakes? With handshakes the
target thread will always be brought to a state where the stack is
walkable. That said I thought all existing mechanisms used a safepoint
VM op to get a stacktrace for a different thread.
Cheers,
David
-----
We have a number of SA tests that request a thread dump, look for a
specific symbol in the thread dump, and fail if the symbol is not found.
Normally what they are looking for is LingeredApp.main() which should be
in the stack trace of the main thread. ClhsdbJstack.java is one such
test. It expects the main thread to look like:
"main" #1 prio=5 tid=0x000001d6301de800 nid=0x3258 waiting on condition
[0x0000007fc1dff000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
JavaThread state: _thread_blocked
- java.lang.Thread.sleep(long) @bci=0, pc=0x000001d640d0f417,
Method*=0x00000008000e8898 (Interpreted frame)
- jdk.test.lib.apps.LingeredApp.main(java.lang.String[]) @bci=54,
line=499, pc=0x000001d640d0a1b3, Method*=0x000001d658673ba0 (Interpreted
frame)
But sometimes all it gets is:
"main" #1 prio=5 tid=0x00007fab2e802000 nid=0x2303 runnable
[0x0000000000000000]
java.lang.Thread.State: RUNNABLE
JavaThread state: _thread_in_java
This results in the test failing because it does not find
LingeredApp.main in the output. The state for the passing case is always
_thread_blocked and for the failing case _thread_in_java. This has been
reported by the following CR:
[1] JDK-8242411 - serviceability/sa/ClhsdbCDSJstackPrintAll.java fails
with Test ERROR java.lang.RuntimeException: 'LingeredApp.main' missing
from stdout/stderr
After starting, LingeredApp.main sits in a loop:
while (Files.exists(path)) {
// Touch the lock to indicate our readiness
setLastModified(theLockFileName, epoch());
Thread.sleep(spinDelay);
}
So it's basically waiting for the lock file to be deleted. By default
spinDelay is 1 second. I suspected the issue I was seeing was due to
asking for the thread dump when not blocked on the sleep(), so I changed
spingDelay to 1ms. That made this missing stack trace issue much easier
to reproduce, plus a several other bugs that are filed, but normally
rarely reproduce:
[2] JDK-8231634 - SA stack walking fails with "illegal bci"
[3] JDK-8240781 - serviceability/sa/ClhsdbJdis.java fails with
"java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for
length 1"
[4] JDK-8211923 - [Testbug] serviceability/sa/ClhsdbFindPC.java
ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
[5] JDK-8204994 - SA might fail to attach to process with "Windbg Error:
WaitForEvent failed"
The "illegal bci" failure I haven't looked into much, but is likely an
SA bug due to SA having issues with (and probably making assumptions
about) the state of the stack.
The two ArrayIndexOutOfBoundsException bugs are dups. They fail because
the stack trace of the main thread is missing, and some String splitting
logic in the test therefore fails and produces the
ArrayIndexOutOfBoundsException.
I'm not sure about the "WaitForEvent failed". It could be unrelated.
I can probably make these all go away buy having Lingered.main() spawn a
helper thread to do the above loop in. That would keep the main thread
stable (blocked on a Thread.join). However, it also would hide some
issues(like the "illegal bci" failure).
The main reason for the email is to ask what are the expectations of
SA's ability to dump a thread's stack trace. Is it expected that
sometimes the thread will be in a state that prevents dumping the stack?
I know for example that the reason we sometimes don't see a stack is
because thread.getLastJavaVFrameDbg() is returning null. Basically SA
throws up its hands and says "I can't do it"? Is that acceptable in some
cases.
thanks,
Chris
[1] https://bugs.openjdk.java.net/browse/JDK-8242411
[2] https://bugs.openjdk.java.net/browse/JDK-8231634
[3] https://bugs.openjdk.java.net/browse/JDK-8240781
[4] https://bugs.openjdk.java.net/browse/JDK-8211923
[5] https://bugs.openjdk.java.net/browse/JDK-8204994