On Fri, 30 Oct 2020 08:49:08 GMT, Dean Long <dl...@openjdk.org> wrote:

>> The imasm::remove_activation() call does not deal with safepoints very well. 
>> However, when the MethodExit JVMTI event is being called, we call into the 
>> runtime in the middle of remove_activation(). If the value being returned is 
>> an object type, then the top-of-stack contains the oop. However, the GC does 
>> not traverse said oop in any oop map, because it is simply not expected that 
>> we safepoint in the middle of remove_activation().
>> 
>> The JvmtiExport::post_method_exit() function we end up calling, reads the 
>> top-of-stack oop, and puts it in a handle. Then it calls JVMTI callbacks, 
>> that eventually call Java and a bunch of stuff that safepoints. So after the 
>> JVMTI callback, we can expect the top-of-stack oop to be broken. 
>> Unfortunately, when we continue, we therefore end up returning a broken oop.
>> 
>> Notably, the fact that InterpreterRuntime::post_method_exit is a JRT_ENTRY, 
>> is wrong, as we can safepoint on the way back to Java, which will break the 
>> return oop in a similar way. So this patch makes it a JRT_BLOCK_ENTRY, 
>> moving the transition to VM and back, into a block of code that is protected 
>> against GC. Before the JRT_BLOCK is called, we stash away the return oop, 
>> and after the JRT_BLOCK_END, we restore the top-of-stack oop. In the path 
>> when InterpreterRuntime::post_method_exit is called when throwing an 
>> exception, we don't have the same problem of retaining an oop result, and 
>> hence the JRT_BLOCK/JRT_BLOCK_END section is not performed in this case; the 
>> logic is the same as before for this path. 
>> 
>> This is a JVMTI bug that has probably been around for a long time. It 
>> crashes with all GCs, but was discovered recently after concurrent stack 
>> processing, as StefanK has been running better GC stressing code in JVMTI, 
>> and the bug reproduced more easily with concurrent stack processing, as the 
>> timings were a bit different. The following reproducer failed pretty much 
>> 100% of the time:
>> while true; do make test JTREG="RETAIN=all" 
>> TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/MethodExitEvent/returnValue/returnValue003/returnValue003.java
>>  TEST_OPTS_JAVA_OPTIONS="-XX:+UseZGC -Xmx2g -XX:ZCollectionInterval=0.0001 
>> -XX:ZFragmentationLimit=0.01 -XX:+VerifyOops -XX:+ZVerifyViews -Xint" ; done 
>> 
>> With my fix I can run this repeatedly without any more failures. I have also 
>> sanity checked the patch by running tier 1-5, so that it does not introduces 
>> any new issues on its own. I have also used Stefan's nice external GC 
>> stressing with jcmd technique that was used to trigger crashes with other 
>> GCs, to make sure said crashes no longer reproduce either.
>
> Marked as reviewed by dlong (Reviewer).

> I think you've discovered JDK-6449023. And you fix looks like the workaround 
> I tried:
> https://bugs.openjdk.java.net/browse/JDK-6449023?focusedCommentId=14206078&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14206078

Oh wow. I had no idea people have been having issues with this since 2009! 
Thanks for the pointer. Well, let's hope we can finally close it now after 
marinating the bug for 11 years.

-------------

PR: https://git.openjdk.java.net/jdk/pull/930

Reply via email to