Re: RFR(S): JDK-8231635: SA Stackwalking code stuck in BasicTypeDataBase.findDynamicTypeForAddress()

[email protected] Fri, 08 Nov 2019 16:02:02 -0800

Hi Chris,

This seems to be a good fix to have in any case.
This check and bail out is right thing to do and should not break anything.
I understand, this also fixes the test failures.

I only had some experience a long time ago with the support of pstackand DTrace jstack action implementation which also does such SPrecovering because the ebp can be used by JIT compiler as a generalpurpose register. There is no such a problem on sparc.


Thanks,
Serguei


On 11/7/19 14:01, Chris Plummer wrote:

Hi,

Please review the following fix for JDK-8231635:

https://bugs.openjdk.java.net/browse/JDK-8231635
http://cr.openjdk.java.net/~cjplummer/8231635/webrev.00/
I've tried to explain below to the best of my ability what's is goingon, but keep in mind that I basically had no background in this areabefore looking into this CR, so this is all new to me. Please feelfree to chime in with corrections to my explanation, or any additionalinsight that might help to further understanding of this code.
When doing a thread stack dump, SA has to figure out the SP for thecurrent frame when it may not in fact be stored anywhere. So it goesthrough a series of guesses, starting with the current value of SP.See AMD64CurrentFrameGuess.run():
    Address sp  = context.getRegisterAsAddress(AMD64ThreadContext.RSP);
There are a number of checks done to see if this is the SP for theactual current frame, one of the checks being (and kind of a lastresort) to follow the frame links and see if they eventually lead tothe first entry frame:
            while (frame != null) {
              if (frame.isEntryFrame() && frame.entryFrameIsFirst()) {
                 ...
                 return true;
              }
              frame = frame.sender(map);
            }

If this fails, there is an outer loop to try the next address:

        for (long offset = 0;
             offset < regionInBytesToSearch;
             offset += vm.getAddressSize()) {
Note that offset is added to the initial SP value that was fetchedfrom RSP. This approach is fraught with danger, because SP could beincorrect, and you can easily follow a bad frame link to an invalidaddress. So the body of this loop is in a try block that catches allExceptions, and simply retries with the next offset if one is caught.Exceptions could be ones like UnalignedAddressException orUnmappedAddressException.
The bug in question turns up with the following harmless looking line:

              frame = frame.sender(map);
This is fine if you know that "frame" is valid, but what if it is not(which is very commonly the case). The frame values (SP, FP, and PC)in the returned frame could be just about anything, including beingthe same as the previous frame. This is what will happen if the SPstored in "frame" is the same as the SP that was used to initialize"frame" in the first place. This can certainly happen when SP is notvalid to start with, and is indeed what caused this bug. The endresult is the inner while loop gets stuck in an infinite looptraversing the same frame. So the fix is to add a check for this tomake sure to break out of the while loop if this happens. Initially Idid this with an Address.equal() call, and that seemed to fix theproblem, but then I realized it would be possible to traverse throughone or more sender frames and eventually end up returning to apreviously visited frame, thus still an infinite loop. So I decided onchecking for Address.lessThanOrEqual() instead since the send frame'sSP should always be greater than the current frame's (referred to asoldFrame) SP. As long as we always move in one direction (towards ahigher frame address), you can't have an infinite loop in this code.
I applied this fix to x86. Although not tested, it is built (allplatform support is always built with SA). The x86 and amd64 versionsare identical except for x86/amd64 references, so I thought it best togo ahead and do the update to x86. I did not touch ppc, but would bewilling to update if someone passes along a fix that is tested.
One final bit of clarification. The bug synopsis mentions gettingstuck in BasicTypeDataBase.findDynamicTypeForAddress(). This turns outto not actually be the case, but every stack trace I initially lookedwhen I filed this CR was showing the thread being in this frame and atthe same line number. This appears to be the next available safepointwhere the thread can be suspended for stack dumping. When debuggingthis some more and adding a lot of println() calls in a lot ofdifferent locations, I started to see different frames in thestacktrace, presumably because the println() calls where addingadditional safepoints.
thanks,

Chris

Re: RFR(S): JDK-8231635: SA Stackwalking code stuck in BasicTypeDataBase.findDynamicTypeForAddress()

Reply via email to