On Mon, 2 Nov 2020 11:14:10 GMT, Erik Österlund <eosterl...@openjdk.org> wrote:
>> The imasm::remove_activation() call does not deal with safepoints very well. >> However, when the MethodExit JVMTI event is being called, we call into the >> runtime in the middle of remove_activation(). If the value being returned is >> an object type, then the top-of-stack contains the oop. However, the GC does >> not traverse said oop in any oop map, because it is simply not expected that >> we safepoint in the middle of remove_activation(). >> >> The JvmtiExport::post_method_exit() function we end up calling, reads the >> top-of-stack oop, and puts it in a handle. Then it calls JVMTI callbacks, >> that eventually call Java and a bunch of stuff that safepoints. So after the >> JVMTI callback, we can expect the top-of-stack oop to be broken. >> Unfortunately, when we continue, we therefore end up returning a broken oop. >> >> Notably, the fact that InterpreterRuntime::post_method_exit is a JRT_ENTRY, >> is wrong, as we can safepoint on the way back to Java, which will break the >> return oop in a similar way. So this patch makes it a JRT_BLOCK_ENTRY, >> moving the transition to VM and back, into a block of code that is protected >> against GC. Before the JRT_BLOCK is called, we stash away the return oop, >> and after the JRT_BLOCK_END, we restore the top-of-stack oop. In the path >> when InterpreterRuntime::post_method_exit is called when throwing an >> exception, we don't have the same problem of retaining an oop result, and >> hence the JRT_BLOCK/JRT_BLOCK_END section is not performed in this case; the >> logic is the same as before for this path. >> >> This is a JVMTI bug that has probably been around for a long time. It >> crashes with all GCs, but was discovered recently after concurrent stack >> processing, as StefanK has been running better GC stressing code in JVMTI, >> and the bug reproduced more easily with concurrent stack processing, as the >> timings were a bit different. The following reproducer failed pretty much >> 100% of the time: >> while true; do make test JTREG="RETAIN=all" >> TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/MethodExitEvent/returnValue/returnValue003/returnValue003.java >> TEST_OPTS_JAVA_OPTIONS="-XX:+UseZGC -Xmx2g -XX:ZCollectionInterval=0.0001 >> -XX:ZFragmentationLimit=0.01 -XX:+VerifyOops -XX:+ZVerifyViews -Xint" ; done >> >> With my fix I can run this repeatedly without any more failures. I have also >> sanity checked the patch by running tier 1-5, so that it does not introduces >> any new issues on its own. I have also used Stefan's nice external GC >> stressing with jcmd technique that was used to trigger crashes with other >> GCs, to make sure said crashes no longer reproduce either. > > Erik Österlund has updated the pull request incrementally with one additional > commit since the last revision: > > Coleen CR1: Refactoring This looks better. Just to have the JRT_BLOCK be unconditional is an improvement. src/hotspot/share/prims/jvmtiExport.cpp line 1570: > 1568: // return a flag when a method terminates by throwing an exception > 1569: // i.e. if an exception is thrown and it's not caught by the current > method > 1570: bool exception_exit = state->is_exception_detected() && > !state->is_exception_caught(); So this only applies to the case where the post_method_exit comes from remove_activation? Good to have it only on this path in this case. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/930