Re: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp

Chris Plummer Fri, 19 Jun 2020 13:53:15 -0700

Hello,

I've updated with webrev based on the new finding that a JavaThreadcannot be on the ThreadList after its OS thread has been destroyed sincethe JavaThread removes itself from the ThreadList, and therefore must berunning on its OS thread. The logic of the fix is unchanged from thefirst webrev, but I updated the comments to better reflect what is goingon. I also updated the CR:


https://bugs.openjdk.java.net/browse/JDK-8247533
http://cr.openjdk.java.net/~cjplummer/8247533/webrev.01/index.html

thanks,

Chris

On 6/19/20 12:24 AM, David Holmes wrote:

Hi Chris,

On 19/06/2020 8:55 am, Chris Plummer wrote:
On 6/18/20 1:43 AM, David Holmes wrote:
On 18/06/2020 4:49 pm, Chris Plummer wrote:
On 6/17/20 10:29 PM, David Holmes wrote:
On 18/06/2020 3:13 pm, Chris Plummer wrote:
On 6/17/20 10:09 PM, David Holmes wrote:
On 18/06/2020 2:33 pm, Chris Plummer wrote:
On 6/17/20 7:43 PM, David Holmes wrote:
Hi Chris,

On 18/06/2020 6:34 am, Chris Plummer wrote:
Hello,

Please help review the following:

https://bugs.openjdk.java.net/browse/JDK-8247533
http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html
The CR contains all the needed details. Here's a summary ofchanges in each file:
The problem sounds to me like a variation of the more generalproblem of not ensuring a thread is kept alive whilst actingupon it. I don't know how the SA finds these references to thethreads it is going to stackwalk, but is it possible to fixthis via appropriate uses of ThreadsListHandle/Iterator?
It fetches ThreadsSMRSupport::_java_thread_list.
Keep in mind that once SA attaches, nothing in the VM changes.For example, SA can't create a wrapper to a JavaThread, only tohave the JavaThread be freed later on. It's just not possible.
Then how does it obtain a reference to a JavaThread for whichthe native OS thread id is invalid? Any thread found in_java_thread_list is either live or still to be started. In thelatter case the JavaThread->osThread does not have its thread_idset yet.
My assumption was that the JavaThread is in the process of beingdestroyed, and it has freed its OS thread but is itself still inthe thread list. I did notice that the OS thread id being usedlooked to be in the range of thread id #'s you would expect forthe running app, so that to me indicated it was once valid, butis no more.
Keep in mind that although hotspot may have synchronization codethat prevents you from pulling a JavaThread off the thread listwhen it is in the process of being destroyed (I'm guessing itdoes), SA has no such protections.
But you stated that once the SA has attached, the target VM can'tchange. If the SA gets its set of thread from one attach thentries to make queries about those threads in a separate attach,then obviously it could be providing garbage thread information.So you would need to re-validate the JavaThread in the target VMbefore trying to do anything with it.
That's not what is going on here. It's attaching and doing a stacktrace, which involves getting the thread list and iterating throughall threads without detaching.
Okay so I restate my original comment - all the JavaThreads must bealive or not yet started, so how are you encountering an invalidthread id? Any thread you find via the ThreadsList can't havedestroyed its osThread. In any case the logic should be checkingthread->osThread() for NULL, and then osThread()->get_state() toensure it is >= INITIALIZED before using the thread_id().
Hi David,
I chatted with Dan about this, and he said since the JavaThread isresponsible for removing itself from the ThreadList, it is impossibleto have a JavaThread still on the ThreadList, but without andunderlying OS Thread. So I'm a bit perplexed as to how I can find aJavaThread on the ThreadList, but that results in ESRCH when tryingto access the thread with ptrace. My only conclusion is that thisfailure is somehow spurious, and maybe the issue it just that thethread is in some temporary state that prevents its access. If so, Istill think the approach I'm taking is the correct one, but thecomments should be updated.
ESRCH can have other meanings but I don't know enough about thebroader context to know whether they are applicable in this case.
ESRCH The specified process does not exist, or is notcurrently being traced by the caller, or is not stopped
              (for requests that require a stopped tracee).
I won't comment further on the fix/workaround as I don't know thecode. I'll leave that to other folk.
Cheers,
David
-----
I had one other finding. When this issue first turned up, itprevented the thread from getting a stack trace due to the exceptionbeing thrown. What I hadn't realize is that after fixing it to notthrow an exception, which resulted in the stack walking code gettingall nulls for register values, I actually started to see a stacktrace printed:
"JLine terminal non blocking reader thread" #26 daemon prio=5tid=0x00007f12f0cd6420 nid=0x1f99 runnable [0x00007f125f0f4000]
    java.lang.Thread.State: RUNNABLE
    JavaThread state: _thread_in_native
WARNING: getThreadIntegerRegisterSet0: get_lwp_regs failed for lwp(8089)CurrentFrameGuess: choosing last Java frame: sp = 0x00007f125f0f4770,fp = 0x00007f125f0f47c0
  - java.io.FileInputStream.read0() @bci=0 (Interpreted frame)
  - java.io.FileInputStream.read() @bci=1, line=223 (Interpreted frame)
- jdk.internal.org.jline.utils.NonBlockingInputStreamImpl.run()@bci=108, line=216 (Interpreted frame) -jdk.internal.org.jline.utils.NonBlockingInputStreamImpl$$Lambda$536+0x0000000800daeca0.run()@bci=4 (Interpreted frame)
  - java.lang.Thread.run() @bci=11, line=832 (Interpreted frame)
The "CurrentFrameGuess" output is some debug tracing I had enabled,and it indicates that the stack walking code is using the "last javaframe" setting, which it will do if current registers values don'tindicate a valid frame (as would be the case if sp was null). I hadpreviously assumed that without an underling valid LWP, there wouldbe no stack trace. Given that there is one, there must be a validLWP. Otherwise I don't see how the stack could have been walked.That's another indication that the ptrace failure is spurious in nature.
thanks,

Chris
Cheers,
David
-----
Also, even if you are using something like clhsdb to issue commandson addresses, if the address is no longer valid for the command youare executing, then you would get the appropriate error when thereis an attempt to create a wrapper for it. I don't know of anycommand that operates directly on a JavaThread, but I think thereare for InstanceKlass. So if you remembered the address of anInstanceKlass, and then reattached and tried a command that takesan InstanceKlass address, you would get an exception when SA triesto create the wrapper for the InsanceKlass if it were no longer avalid address for one.
Chris
David
-----
Chris
David
-----
Chris
Cheers,
David
src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cppsrc/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m
src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp
-Instead of throwing an exception when the OS ThreadID isinvalid, print a warning.
src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c
-Improve a print_debug message
src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.javasrc/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.javasrc/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java-Deal with the array of registers read in being null due tothe OS ThreadID not being valid.
src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.javasrc/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java-Fix issue with "sun.jvm.hotspot.debugger.DebuggerException"appearing twice when printing the exception.
thanks,

Chris

Re: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp

Reply via email to