Hi Chris,

The change looks good to me.

Thanks!
--Daniil

On 11/12/19, 11:06 AM, "serviceability-dev on behalf of Chris Plummer" 
<serviceability-dev-boun...@openjdk.java.net on behalf of 
chris.plum...@oracle.com> wrote:

    Thanks Serguei!
    
    Can I get one more review please?
    
    thanks,
    
    Chris
    
    On 11/8/19 4:00 PM, serguei.spit...@oracle.com wrote:
    > Hi Chris,
    >
    > This seems to be a good fix to have in any case.
    > This check and bail out is right thing to do and should not break 
    > anything.
    > I understand, this also fixes the test failures.
    >
    > I only had some experience a long time ago with the support of pstack 
    > and DTrace jstack action implementation which also does such SP 
    > recovering because the ebp can be used by JIT compiler as a general 
    > purpose register. There is no such a problem on sparc.
    >
    > Thanks,
    > Serguei
    >
    >
    > On 11/7/19 14:01, Chris Plummer wrote:
    >> Hi,
    >>
    >> Please review the following fix for JDK-8231635:
    >>
    >> https://bugs.openjdk.java.net/browse/JDK-8231635
    >> http://cr.openjdk.java.net/~cjplummer/8231635/webrev.00/
    >>
    >> I've tried to explain below to the best of my ability what's is going 
    >> on, but keep in mind that I basically had no background in this area 
    >> before looking into this CR, so this is all new to me. Please feel 
    >> free to chime in with corrections to my explanation, or any 
    >> additional insight that might help to further understanding of this 
    >> code.
    >>
    >> When doing a thread stack dump, SA has to figure out the SP for the 
    >> current frame when it may not in fact be stored anywhere. So it goes 
    >> through a series of guesses, starting with the current value of SP. 
    >> See AMD64CurrentFrameGuess.run():
    >>
    >>     Address sp  = context.getRegisterAsAddress(AMD64ThreadContext.RSP);
    >>
    >> There are a number of checks done to see if this is the SP for the 
    >> actual current frame, one of the checks being (and kind of a last 
    >> resort) to follow the frame links and see if they eventually lead to 
    >> the first entry frame:
    >>
    >>             while (frame != null) {
    >>               if (frame.isEntryFrame() && frame.entryFrameIsFirst()) {
    >>                  ...
    >>                  return true;
    >>               }
    >>               frame = frame.sender(map);
    >>             }
    >>
    >> If this fails, there is an outer loop to try the next address:
    >>
    >>         for (long offset = 0;
    >>              offset < regionInBytesToSearch;
    >>              offset += vm.getAddressSize()) {
    >>
    >> Note that offset is added to the initial SP value that was fetched 
    >> from RSP. This approach is fraught with danger, because SP could be 
    >> incorrect, and you can easily follow a bad frame link to an invalid 
    >> address. So the body of this loop is in a try block that catches all 
    >> Exceptions, and simply retries with the next offset if one is caught. 
    >> Exceptions could be ones like UnalignedAddressException or 
    >> UnmappedAddressException.
    >>
    >> The bug in question turns up with the following harmless looking line:
    >>
    >>               frame = frame.sender(map);
    >>
    >> This is fine if you know that "frame" is valid, but what if it is not 
    >> (which is very commonly the case). The frame values (SP, FP, and PC) 
    >> in the returned frame could be just about anything, including being 
    >> the same as the previous frame. This is what will happen if the SP 
    >> stored in "frame" is the same as the SP that was used to initialize 
    >> "frame" in the first place. This can certainly happen when SP is not 
    >> valid to start with, and is indeed what caused this bug. The end 
    >> result is the inner while loop gets stuck in an infinite loop 
    >> traversing the same frame. So the fix is to add a check for this to 
    >> make sure to break out of the while loop if this happens. Initially I 
    >> did this with an Address.equal() call, and that seemed to fix the 
    >> problem, but then I realized it would be possible to traverse through 
    >> one or more sender frames and eventually end up returning to a 
    >> previously visited frame, thus still an infinite loop. So I decided 
    >> on checking for Address.lessThanOrEqual() instead since the send 
    >> frame's SP should always be greater than the current frame's 
    >> (referred to as oldFrame) SP. As long as we always move in one 
    >> direction (towards a higher frame address), you can't have an 
    >> infinite loop in this code.
    >>
    >> I applied this fix to x86. Although not tested, it is built (all 
    >> platform support is always built with SA). The x86 and amd64 versions 
    >> are identical except for x86/amd64 references, so I thought it best 
    >> to go ahead and do the update to x86. I did not touch ppc, but would 
    >> be willing to update if someone passes along a fix that is tested.
    >>
    >> One final bit of clarification. The bug synopsis mentions getting 
    >> stuck in BasicTypeDataBase.findDynamicTypeForAddress(). This turns 
    >> out to not actually be the case, but every stack trace I initially 
    >> looked when I filed this CR was showing the thread being in this 
    >> frame and at the same line number. This appears to be the next 
    >> available safepoint where the thread can be suspended for stack 
    >> dumping. When debugging this some more and adding a lot of println() 
    >> calls in a lot of different locations, I started to see different 
    >> frames in the stacktrace, presumably because the println() calls 
    >> where adding additional safepoints.
    >>
    >> thanks,
    >>
    >> Chris
    >>
    >
    
    
    
    


Reply via email to