Re: RFR(S): JDK-8231635: SA Stackwalking code stuck in BasicTypeDataBase.findDynamicTypeForAddress()

Chris Plummer Wed, 13 Nov 2019 10:58:47 -0800

Thanks Daniil!

Chris


On 11/12/19 9:08 PM, Daniil Titov wrote:

Hi Chris,

The change looks good to me.

Thanks!
--Daniil

On 11/12/19, 11:06 AM, "serviceability-dev on behalf of Chris Plummer" 
<[email protected] on behalf of [email protected]> wrote:

     Thanks Serguei!

Can I get one more review please?thanks,ChrisOn 11/8/19 4:00 PM, [email protected] wrote:

     > Hi Chris,
     >
     > This seems to be a good fix to have in any case.
     > This check and bail out is right thing to do and should not break
     > anything.
     > I understand, this also fixes the test failures.
     >
     > I only had some experience a long time ago with the support of pstack
     > and DTrace jstack action implementation which also does such SP
     > recovering because the ebp can be used by JIT compiler as a general
     > purpose register. There is no such a problem on sparc.
     >
     > Thanks,
     > Serguei
     >
     >
     > On 11/7/19 14:01, Chris Plummer wrote:
     >> Hi,
     >>
     >> Please review the following fix for JDK-8231635:
     >>
     >> https://bugs.openjdk.java.net/browse/JDK-8231635
     >> http://cr.openjdk.java.net/~cjplummer/8231635/webrev.00/
     >>
     >> I've tried to explain below to the best of my ability what's is going
     >> on, but keep in mind that I basically had no background in this area
     >> before looking into this CR, so this is all new to me. Please feel
     >> free to chime in with corrections to my explanation, or any
     >> additional insight that might help to further understanding of this
     >> code.
     >>
     >> When doing a thread stack dump, SA has to figure out the SP for the
     >> current frame when it may not in fact be stored anywhere. So it goes
     >> through a series of guesses, starting with the current value of SP.
     >> See AMD64CurrentFrameGuess.run():
     >>
     >>     Address sp  = context.getRegisterAsAddress(AMD64ThreadContext.RSP);
     >>
     >> There are a number of checks done to see if this is the SP for the
     >> actual current frame, one of the checks being (and kind of a last
     >> resort) to follow the frame links and see if they eventually lead to
     >> the first entry frame:
     >>
     >>             while (frame != null) {
     >>               if (frame.isEntryFrame() && frame.entryFrameIsFirst()) {
     >>                  ...
     >>                  return true;
     >>               }
     >>               frame = frame.sender(map);
     >>             }
     >>
     >> If this fails, there is an outer loop to try the next address:
     >>
     >>         for (long offset = 0;
     >>              offset < regionInBytesToSearch;
     >>              offset += vm.getAddressSize()) {
     >>
     >> Note that offset is added to the initial SP value that was fetched
     >> from RSP. This approach is fraught with danger, because SP could be
     >> incorrect, and you can easily follow a bad frame link to an invalid
     >> address. So the body of this loop is in a try block that catches all
     >> Exceptions, and simply retries with the next offset if one is caught.
     >> Exceptions could be ones like UnalignedAddressException or
     >> UnmappedAddressException.
     >>
     >> The bug in question turns up with the following harmless looking line:
     >>
     >>               frame = frame.sender(map);
     >>
     >> This is fine if you know that "frame" is valid, but what if it is not
     >> (which is very commonly the case). The frame values (SP, FP, and PC)
     >> in the returned frame could be just about anything, including being
     >> the same as the previous frame. This is what will happen if the SP
     >> stored in "frame" is the same as the SP that was used to initialize
     >> "frame" in the first place. This can certainly happen when SP is not
     >> valid to start with, and is indeed what caused this bug. The end
     >> result is the inner while loop gets stuck in an infinite loop
     >> traversing the same frame. So the fix is to add a check for this to
     >> make sure to break out of the while loop if this happens. Initially I
     >> did this with an Address.equal() call, and that seemed to fix the
     >> problem, but then I realized it would be possible to traverse through
     >> one or more sender frames and eventually end up returning to a
     >> previously visited frame, thus still an infinite loop. So I decided
     >> on checking for Address.lessThanOrEqual() instead since the send
     >> frame's SP should always be greater than the current frame's
     >> (referred to as oldFrame) SP. As long as we always move in one
     >> direction (towards a higher frame address), you can't have an
     >> infinite loop in this code.
     >>
     >> I applied this fix to x86. Although not tested, it is built (all
     >> platform support is always built with SA). The x86 and amd64 versions
     >> are identical except for x86/amd64 references, so I thought it best
     >> to go ahead and do the update to x86. I did not touch ppc, but would
     >> be willing to update if someone passes along a fix that is tested.
     >>
     >> One final bit of clarification. The bug synopsis mentions getting
     >> stuck in BasicTypeDataBase.findDynamicTypeForAddress(). This turns
     >> out to not actually be the case, but every stack trace I initially
     >> looked when I filed this CR was showing the thread being in this
     >> frame and at the same line number. This appears to be the next
     >> available safepoint where the thread can be suspended for stack
     >> dumping. When debugging this some more and adding a lot of println()
     >> calls in a lot of different locations, I started to see different
     >> frames in the stacktrace, presumably because the println() calls
     >> where adding additional safepoints.
     >>
     >> thanks,
     >>
     >> Chris
     >>
     >

Re: RFR(S): JDK-8231635: SA Stackwalking code stuck in BasicTypeDataBase.findDynamicTypeForAddress()

Reply via email to