On Thu, 29 Oct 2020 12:44:58 GMT, Erik Österlund <eosterl...@openjdk.org> wrote:

> The imasm::remove_activation() call does not deal with safepoints very well. 
> However, when the MethodExit JVMTI event is being called, we call into the 
> runtime in the middle of remove_activation(). If the value being returned is 
> an object type, then the top-of-stack contains the oop. However, the GC does 
> not traverse said oop in any oop map, because it is simply not expected that 
> we safepoint in the middle of remove_activation().
> 
> The JvmtiExport::post_method_exit() function we end up calling, reads the 
> top-of-stack oop, and puts it in a handle. Then it calls JVMTI callbacks, 
> that eventually call Java and a bunch of stuff that safepoints. So after the 
> JVMTI callback, we can expect the top-of-stack oop to be broken. 
> Unfortunately, when we continue, we therefore end up returning a broken oop.
> 
> Notably, the fact that InterpreterRuntime::post_method_exit is a JRT_ENTRY, 
> is wrong, as we can safepoint on the way back to Java, which will break the 
> return oop in a similar way. So this patch makes it a JRT_BLOCK_ENTRY, moving 
> the transition to VM and back, into a block of code that is protected against 
> GC. Before the JRT_BLOCK is called, we stash away the return oop, and after 
> the JRT_BLOCK_END, we restore the top-of-stack oop. In the path when 
> InterpreterRuntime::post_method_exit is called when throwing an exception, we 
> don't have the same problem of retaining an oop result, and hence the 
> JRT_BLOCK/JRT_BLOCK_END section is not performed in this case; the logic is 
> the same as before for this path. 
> 
> This is a JVMTI bug that has probably been around for a long time. It crashes 
> with all GCs, but was discovered recently after concurrent stack processing, 
> as StefanK has been running better GC stressing code in JVMTI, and the bug 
> reproduced more easily with concurrent stack processing, as the timings were 
> a bit different. The following reproducer failed pretty much 100% of the time:
> while true; do make test JTREG="RETAIN=all" 
> TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/MethodExitEvent/returnValue/returnValue003/returnValue003.java
>  TEST_OPTS_JAVA_OPTIONS="-XX:+UseZGC -Xmx2g -XX:ZCollectionInterval=0.0001 
> -XX:ZFragmentationLimit=0.01 -XX:+VerifyOops -XX:+ZVerifyViews -Xint" ; done 
> 
> With my fix I can run this repeatedly without any more failures. I have also 
> sanity checked the patch by running tier 1-5, so that it does not introduces 
> any new issues on its own. I have also used Stefan's nice external GC 
> stressing with jcmd technique that was used to trigger crashes with other 
> GCs, to make sure said crashes no longer reproduce either.

Hi Erik,

Nice discovery! Indeed, this is a long standing issue. It looks good in general.
I agree with Coleen, it would be nice if there is an elegant way to move the 
oop_result saving/restoring code to InterpreterRuntime::post_method_exit. 
Otherwise, I'm okay with what you have now.
It is also nice discovery of the issue with clearing the expression stack. I 
think, it was my mistake in the initial implementation of the ForceEarlyReturn 
when I followed the PopFrame implementation pattern. It is good to separate it 
from the current fix.

Thanks,
Serguei

-------------

PR: https://git.openjdk.java.net/jdk/pull/930

Reply via email to