Hi Serguei, > The implementation looks good to me.
Thanks. > But I do not understand what the test is doing with all this counters and > recursions. The @comment gives an explanation: the target thread builds a stack as large as possible to prolong the unsafe stackwalk. This is done by means of recursion. > For instance, these fragments: > 86 recursions = 0; > 87 try { > 88 recursiveMethod(1<<20); > 89 } catch (StackOverflowError e) { > 90 msg("Caught StackOverflowError as expected"); > 91 } > 92 int target_depth = recursions-100; // spaces are missed > around the '-' sigh Here we calibrate the test for maximum stack depth. We measure how large the stack can grow by calling recursiveMethod() until a StackOverflowError is raised. We use recursions - 100 as target_depth to avoid the StackOverflowError during the actual test. > It is not obvious that the 'recursion' is updated in the recursiveMethod. > I would suggestto make it more explicit: > recursiveMethod(M); > int target_depth = M - 100; > Then the variable 'recursions' can be removed or become local. > This method will be: > 47 private static final int M = 1 << 20; > ... > 121 public long recursiveMethod(int depth) { > 123 if (depth == 0) { > 124 notifyAgentToGetLocalAndWaitShortly(M - 100, > waitTimeInNativeAfterNotify); > 126 } else { > 127 recursiveMethod(--depth); > 128 } > 129 } I don't think this would work. A StackOverflowError would be raised before notifyAgentToGetLocalAndWaitShortly() is called. > At least, he test is missing the comments explaining all these. The arguments to the msg() method serve as documentation too. I would not want to repeat the msg strings in comments. Thanks, Richard. ------------------------------------------------------------- From: serguei.spit...@oracle.com <serguei.spit...@oracle.com> Sent: Montag, 10. August 2020 19:22 To: David Holmes <david.hol...@oracle.com>; Reingruber, Richard <richard.reingru...@sap.com>; serviceability-dev@openjdk.java.net Subject: Re: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue() Hi Richard and David, The implementation looks good to me. But I do not understand what the test is doing with all this counters and recursions. For instance, these fragments: 86 recursions = 0; 87 try { 88 recursiveMethod(1<<20); 89 } catch (StackOverflowError e) { 90 msg("Caught StackOverflowError as expected"); 91 } 92 int target_depth = recursions-100; // spaces are missed around the '-' sigh It is not obvious that the 'recursion' is updated in the recursiveMethod. I would suggestto make it more explicit: recursiveMethod(M); int target_depth = M - 100; Then the variable 'recursions' can be removed or become local. This method will be: 47 private static final int M = 1 << 20; ... 121 public long recursiveMethod(int depth) { 123 if (depth == 0) { 124 notifyAgentToGetLocalAndWaitShortly(M - 100, waitTimeInNativeAfterNotify); 126 } else { 127 recursiveMethod(--depth); 128 } 129 } At least, he test is missing the comments explaining all these. Thanks, Serguei On 8/9/20 22:35, David Holmes wrote: Hi Richard, On 31/07/2020 5:28 pm, Reingruber, Richard wrote: Hi, I rebase the fix after JDK-8250042. New webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.2/ The general fix for this seems good. A minor nit: 588 if (!is_assignable(signature, ob_k, Thread::current())) { You know that the current thread is the VMThread so can use VMThread::vm_thread(). Similarly for this existing code: 694 Thread* current_thread = Thread::current(); --- Looking at the test code ... I'm less clear on exactly what is happening and the use of spin-waits raises some red-flags for me in terms of test reliability on different platforms. The "while (--waitCycles > 0)" loop in particular offers no certainty that the agent thread is executing anything in particular. And the use of the spin_count as a guide to future waiting time seems somewhat arbitrary. In all seriousness I got a headache trying to work out how the test was expecting to operate. Some parts could be simplified using raw monitors, I think. But there's no sure way to know the agent thread is in the midst of the stackwalk when the target thread wants to leave the native code. So I understand what you are trying to achieve here, I'm just not sure how reliably it will actually achieve it. test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp 32 static volatile jlong spinn_count = 0; Using a 64-bit counter seems like it will be a problem on 32-bit systems. Should be spin_count not spinn_count. 36 // Agent thread waits for value != 0, then performas the JVMTI call to get local variable. typo: performas Thanks, David ----- Thanks, Richard. -----Original Message----- From: serviceability-dev mailto:serviceability-dev-r...@openjdk.java.net On Behalf Of Reingruber, Richard Sent: Montag, 27. Juli 2020 09:45 To: mailto:serguei.spit...@oracle.com; mailto:serviceability-dev@openjdk.java.net Subject: [CAUTION] RE: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue() Hi Serguei, > I tested it on Linux and Windows but not yet on MacOS. The test succeeded now on all platforms. Thanks, Richard. -----Original Message----- From: Reingruber, Richard Sent: Freitag, 24. Juli 2020 15:04 To: mailto:serguei.spit...@oracle.com; mailto:serviceability-dev@openjdk.java.net Subject: RE: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue() Hi Serguei, The fix itself looks good to me. thanks for looking at the fix. I still need another look at new test. Could you, please, convert the agent of new test to C++? It will make it a little bit simpler. Sure, here is the new webrev.1 with a C++ version of the test agent: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.1/ I tested it on Linux and Windows but not yet on MacOS. Thanks, Richard. -----Original Message----- From: mailto:serguei.spit...@oracle.com mailto:serguei.spit...@oracle.com Sent: Freitag, 24. Juli 2020 00:00 To: Reingruber, Richard mailto:richard.reingru...@sap.com; mailto:serviceability-dev@openjdk.java.net Subject: Re: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue() Hi Richard, Thank you for filing the CR and taking care about it! The fix itself looks good to me. I still need another look at new test. Could you, please, convert the agent of new test to C++? It will make it a little bit simpler. Thanks, Serguei On 7/20/20 01:15, Reingruber, Richard wrote: Hi, please help review this fix for VM_GetOrSetLocal. It moves the unsafe stackwalk from the vm operation prologue before the safepoint into the doit() method executed at the safepoint. Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.0/index.html Bug: https://bugs.openjdk.java.net/browse/JDK-8249293 According to the JVMTI spec on local variable access it is not required to suspend the target thread T [1]. The operation will simply fail with JVMTI_ERROR_NO_MORE_FRAMES if T is executing bytecodes. It will succeed though if T is blocked because of synchronization or executing some native code. The issue is that in the latter case the stack walk in VM_GetOrSetLocal::doit_prologue() to prepare the access to the local variable is unsafe, because it is done before the safepoint and it races with T returning to execute bytecodes making its stack not walkable. The included test shows that this can crash the VM if T wins the race. Manual testing: - new test test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java - test/hotspot/jtreg/vmTestbase/nsk/jvmti - test/hotspot/jtreg/serviceability/jvmti Nightly regression tests @SAP: JCK and JTREG, also in Xcomp mode, SPECjvm2008, SPECjbb2015, Renaissance Suite, SAP specific tests with fastdebug and release builds on all platforms Thanks, Richard. [1] https://docs.oracle.com/en/java/javase/14/docs/specs/jvmti.html#local