On 4/14/20 10:56 PM, David Holmes wrote:
Hi Chris,
On 15/04/2020 1:37 pm, Chris Plummer wrote:
Hello,
[Sorry this email got kind of long. To cut to the chase, I want to
know if there are times where it is acceptable for SA to not be able
to produce a stack trace for a thread. Details below if you are
interested.]
How does the SA currently attempt to get a stacktrace for a thread? If
the current mechanism has limitations then perhaps that will be
addressed now that we have per-thread handshakes? With handshakes the
target thread will always be brought to a state where the stack is
walkable. That said I thought all existing mechanisms used a safepoint
VM op to get a stacktrace for a different thread.
Hi David,
I don't know the details, but I think there is no safepointing involved.
Consider that you can also get thread stack traces from a JVM core file,
which could be produced at any given moment in the JVM's execution. Note
SA has lots of safeguards around this. It has ways of verifying if
something points to what it is really suppose to point to. For example
if SA thinks something is a pointer to a heap object, it verifies that
the object header contains a pointer to something that is known to be an
Klass, which it can determine by looking at the vtable (and then cross
referencing with symbolic info it pulls from the executable). I'm just
not too sure to what extent this applies to thread stack walking. As I
mentioned below, thread.getLastJavaVFrameDbg() sometimes returns null,
thus no stack trace.
/** This should only be used by a debugger. Uses the current frame
guess to attempt to get the topmost JavaVFrame.
(getLastJavaVFrame, as a port of the VM's routine, assumes the
VM is at a safepoint.) */
public JavaVFrame getLastJavaVFrameDbg() {
So it seems that since we are not at a safepoint, and there's a lot of
"guessing" in the implementation. I gather from this the thread won't
always be in a state where we can determine the last vframe, and
therefore not in a state where SA can produce a stack trace.
thanks,
Chris
Cheers,
David
-----
We have a number of SA tests that request a thread dump, look for a
specific symbol in the thread dump, and fail if the symbol is not
found. Normally what they are looking for is LingeredApp.main() which
should be in the stack trace of the main thread. ClhsdbJstack.java is
one such test. It expects the main thread to look like:
"main" #1 prio=5 tid=0x000001d6301de800 nid=0x3258 waiting on
condition [0x0000007fc1dff000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
JavaThread state: _thread_blocked
- java.lang.Thread.sleep(long) @bci=0, pc=0x000001d640d0f417,
Method*=0x00000008000e8898 (Interpreted frame)
- jdk.test.lib.apps.LingeredApp.main(java.lang.String[]) @bci=54,
line=499, pc=0x000001d640d0a1b3, Method*=0x000001d658673ba0
(Interpreted frame)
But sometimes all it gets is:
"main" #1 prio=5 tid=0x00007fab2e802000 nid=0x2303 runnable
[0x0000000000000000]
java.lang.Thread.State: RUNNABLE
JavaThread state: _thread_in_java
This results in the test failing because it does not find
LingeredApp.main in the output. The state for the passing case is
always _thread_blocked and for the failing case _thread_in_java. This
has been reported by the following CR:
[1] JDK-8242411 - serviceability/sa/ClhsdbCDSJstackPrintAll.java
fails with Test ERROR java.lang.RuntimeException: 'LingeredApp.main'
missing from stdout/stderr
After starting, LingeredApp.main sits in a loop:
while (Files.exists(path)) {
// Touch the lock to indicate our readiness
setLastModified(theLockFileName, epoch());
Thread.sleep(spinDelay);
}
So it's basically waiting for the lock file to be deleted. By default
spinDelay is 1 second. I suspected the issue I was seeing was due to
asking for the thread dump when not blocked on the sleep(), so I
changed spingDelay to 1ms. That made this missing stack trace issue
much easier to reproduce, plus a several other bugs that are filed,
but normally rarely reproduce:
[2] JDK-8231634 - SA stack walking fails with "illegal bci"
[3] JDK-8240781 - serviceability/sa/ClhsdbJdis.java fails with
"java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for
length 1"
[4] JDK-8211923 - [Testbug] serviceability/sa/ClhsdbFindPC.java
ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
[5] JDK-8204994 - SA might fail to attach to process with "Windbg
Error: WaitForEvent failed"
The "illegal bci" failure I haven't looked into much, but is likely
an SA bug due to SA having issues with (and probably making
assumptions about) the state of the stack.
The two ArrayIndexOutOfBoundsException bugs are dups. They fail
because the stack trace of the main thread is missing, and some
String splitting logic in the test therefore fails and produces the
ArrayIndexOutOfBoundsException.
I'm not sure about the "WaitForEvent failed". It could be unrelated.
I can probably make these all go away buy having Lingered.main()
spawn a helper thread to do the above loop in. That would keep the
main thread stable (blocked on a Thread.join). However, it also would
hide some issues(like the "illegal bci" failure).
The main reason for the email is to ask what are the expectations of
SA's ability to dump a thread's stack trace. Is it expected that
sometimes the thread will be in a state that prevents dumping the
stack? I know for example that the reason we sometimes don't see a
stack is because thread.getLastJavaVFrameDbg() is returning null.
Basically SA throws up its hands and says "I can't do it"? Is that
acceptable in some cases.
thanks,
Chris
[1] https://bugs.openjdk.java.net/browse/JDK-8242411
[2] https://bugs.openjdk.java.net/browse/JDK-8231634
[3] https://bugs.openjdk.java.net/browse/JDK-8240781
[4] https://bugs.openjdk.java.net/browse/JDK-8211923
[5] https://bugs.openjdk.java.net/browse/JDK-8204994