On 6/18/20 1:43 AM, David Holmes wrote:
On 18/06/2020 4:49 pm, Chris Plummer wrote:
On 6/17/20 10:29 PM, David Holmes wrote:
On 18/06/2020 3:13 pm, Chris Plummer wrote:
On 6/17/20 10:09 PM, David Holmes wrote:
On 18/06/2020 2:33 pm, Chris Plummer wrote:
On 6/17/20 7:43 PM, David Holmes wrote:
Hi Chris,
On 18/06/2020 6:34 am, Chris Plummer wrote:
Hello,
Please help review the following:
https://bugs.openjdk.java.net/browse/JDK-8247533
http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html
The CR contains all the needed details. Here's a summary of
changes in each file:
The problem sounds to me like a variation of the more general
problem of not ensuring a thread is kept alive whilst acting
upon it. I don't know how the SA finds these references to the
threads it is going to stackwalk, but is it possible to fix this
via appropriate uses of ThreadsListHandle/Iterator?
It fetches ThreadsSMRSupport::_java_thread_list.
Keep in mind that once SA attaches, nothing in the VM changes.
For example, SA can't create a wrapper to a JavaThread, only to
have the JavaThread be freed later on. It's just not possible.
Then how does it obtain a reference to a JavaThread for which the
native OS thread id is invalid? Any thread found in
_java_thread_list is either live or still to be started. In the
latter case the JavaThread->osThread does not have its thread_id
set yet.
My assumption was that the JavaThread is in the process of being
destroyed, and it has freed its OS thread but is itself still in
the thread list. I did notice that the OS thread id being used
looked to be in the range of thread id #'s you would expect for the
running app, so that to me indicated it was once valid, but is no
more.
Keep in mind that although hotspot may have synchronization code
that prevents you from pulling a JavaThread off the thread list
when it is in the process of being destroyed (I'm guessing it
does), SA has no such protections.
But you stated that once the SA has attached, the target VM can't
change. If the SA gets its set of thread from one attach then tries
to make queries about those threads in a separate attach, then
obviously it could be providing garbage thread information. So you
would need to re-validate the JavaThread in the target VM before
trying to do anything with it.
That's not what is going on here. It's attaching and doing a stack
trace, which involves getting the thread list and iterating through
all threads without detaching.
Okay so I restate my original comment - all the JavaThreads must be
alive or not yet started, so how are you encountering an invalid
thread id? Any thread you find via the ThreadsList can't have
destroyed its osThread. In any case the logic should be checking
thread->osThread() for NULL, and then osThread()->get_state() to
ensure it is >= INITIALIZED before using the thread_id().
Hi David,
I chatted with Dan about this, and he said since the JavaThread is
responsible for removing itself from the ThreadList, it is impossible to
have a JavaThread still on the ThreadList, but without and underlying OS
Thread. So I'm a bit perplexed as to how I can find a JavaThread on the
ThreadList, but that results in ESRCH when trying to access the thread
with ptrace. My only conclusion is that this failure is somehow
spurious, and maybe the issue it just that the thread is in some
temporary state that prevents its access. If so, I still think the
approach I'm taking is the correct one, but the comments should be updated.
I had one other finding. When this issue first turned up, it prevented
the thread from getting a stack trace due to the exception being thrown.
What I hadn't realize is that after fixing it to not throw an exception,
which resulted in the stack walking code getting all nulls for register
values, I actually started to see a stack trace printed:
"JLine terminal non blocking reader thread" #26 daemon prio=5
tid=0x00007f12f0cd6420 nid=0x1f99 runnable [0x00007f125f0f4000]
java.lang.Thread.State: RUNNABLE
JavaThread state: _thread_in_native
WARNING: getThreadIntegerRegisterSet0: get_lwp_regs failed for lwp (8089)
CurrentFrameGuess: choosing last Java frame: sp = 0x00007f125f0f4770, fp
= 0x00007f125f0f47c0
- java.io.FileInputStream.read0() @bci=0 (Interpreted frame)
- java.io.FileInputStream.read() @bci=1, line=223 (Interpreted frame)
- jdk.internal.org.jline.utils.NonBlockingInputStreamImpl.run()
@bci=108, line=216 (Interpreted frame)
-
jdk.internal.org.jline.utils.NonBlockingInputStreamImpl$$Lambda$536+0x0000000800daeca0.run()
@bci=4 (Interpreted frame)
- java.lang.Thread.run() @bci=11, line=832 (Interpreted frame)
The "CurrentFrameGuess" output is some debug tracing I had enabled, and
it indicates that the stack walking code is using the "last java frame"
setting, which it will do if current registers values don't indicate a
valid frame (as would be the case if sp was null). I had previously
assumed that without an underling valid LWP, there would be no stack
trace. Given that there is one, there must be a valid LWP. Otherwise I
don't see how the stack could have been walked. That's another
indication that the ptrace failure is spurious in nature.
thanks,
Chris
Cheers,
David
-----
Also, even if you are using something like clhsdb to issue commands
on addresses, if the address is no longer valid for the command you
are executing, then you would get the appropriate error when there is
an attempt to create a wrapper for it. I don't know of any command
that operates directly on a JavaThread, but I think there are for
InstanceKlass. So if you remembered the address of an InstanceKlass,
and then reattached and tried a command that takes an InstanceKlass
address, you would get an exception when SA tries to create the
wrapper for the InsanceKlass if it were no longer a valid address for
one.
Chris
David
-----
Chris
David
-----
Chris
Cheers,
David
src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp
src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m
src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp
-Instead of throwing an exception when the OS ThreadID is
invalid, print a warning.
src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c
-Improve a print_debug message
src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java
src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java
src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java
-Deal with the array of registers read in being null due to the
OS ThreadID not being valid.
src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java
src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java
-Fix issue with "sun.jvm.hotspot.debugger.DebuggerException"
appearing twice when printing the exception.
thanks,
Chris