Thanks David, have a good time! Richard. -----Original Message----- From: David Holmes <david.hol...@oracle.com> Sent: Dienstag, 18. August 2020 07:20 To: Reingruber, Richard <richard.reingru...@sap.com>; serguei.spit...@oracle.com; serviceability-dev@openjdk.java.net Subject: Re: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()
Hi Richard, The test seems a lot clearer to me now. I'll leave it to you are Serguei to iron out any last wrinkles as I am disappearing on vacation for a week after today. But you have my Review. Thanks, David On 15/08/2020 12:06 am, Reingruber, Richard wrote: > Hi Serguei, > > thanks for the feedback. I have implemented your suggestions and created a new > webrev: > > Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.4/ > Delta: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.4.inc/ > > Please find my replies to your comments below. > > Best regards, > Richard. > >> http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java.frames.html > >> 33 * the stack walk. The target thread's stack is walkable >> while in native. After sending the notification it >> ... > >> 54 * @param depth Depth of target frame for GetLocalObject() call. >> Should be large value to prolong the unsafe stack walk. >> 55 * @param waitTimeInNativeAfterNotify Time to wait after notify >> with walkable stack before returning an becoming unsafe again. >> ... > >> 71 * Wait time in native, i.e. with walkable stack, after notifying >> agent thread to do GetLocalObject() call. >> ... > >> 89 msg((now -start) + " ms Iteration : " + iterations + " >> waitTimeInNativeAfterNotify : " + waitTimeInNativeAfterNotify); > >> Could you, please, re-balance the lines above to make them shorter? > > Ok, done. > > >> 90 int newTargetDepth = recursiveMethod(0, targetDepth); >> 91 if (newTargetDepth < targetDepth) { >> 92 msg("StackOverflowError during test."); >> 93 msg("Old target depth: " + targetDepth); >> 94 msg("Retry with new target depth: " + newTargetDepth); >> 95 targetDepth = newTargetDepth; >> 96 } >> A comment is needed to explain why a StackOverflowError is not desired. >> At least, it is not obvious initially. >> 73 public int waitTimeInNativeAfterNotify; > >> This name is unreasonably long which makes the code less readable. >> I'd suggest to reduce it to waitTime. > > Ok, done. > >> 103 notifyAgentToGetLocalAndWaitShortly(-1, 1); >> ... >> 119 notifyAgentToGetLocalAndWaitShortly(depth - 100, >> waitTimeInNativeAfterNotify); > >> It is better to provide a short comment before each call explaining what it >> is doing. >> For instance, it is not clear why the call at the line 103 is needed. >> Why do we need to notify the agent to GetLocal for the second time? > > The test is repeated TEST_ITERATIONS times. In each iteration the agent calls > GetLocal racing the target thread returning from the native call. The last > call > in line 103 ist the shutdown signal. > >> Can it be refactored into a separate native method? > > I've made the shutdown process more explicit with the new native method > shutDown() which sets thest_state to ShutDown. > >> Then the the function name can be reduced to 'notifyAgentToGetLocal'. >> This long name does not give enough context anyway. > > Ok, done. > >> 85 long iterations = 0; >> 87 do { >> ... >> 97 iterations++; >> ... >> 102 } while (iterations < TEST_ITERATIONS); > >> Why a more explicit 'for' or 'while' loop is not used here? : >> for (long iter = 0; iter < TEST_ITERATIONS; iter++) { > > I have converted the loop into a for loop. > >> http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp.frames.html > >> The indent in this file varies. It is better to keep it the same: 4 or 2. > > Yes, I noticed this. I have not corrected it yet, because I didn't want to > pullute the incremental webrev with that change. Would you like me to fix the > indentation now to 2 spaces or do it as a last step? > >> 60 AgentCallingGetLocalObject // The target thread waits for the agent >> to call >> I'd suggest to rename the constant to 'AgentInGetLocal'. > > Ok, done. > >> 150 GetLocalWithoutSuspendTestThreadLoop(jvmtiEnv * jvmti, JNIEnv* env, >> void * arg) { > >> It is better rename the function to TestThreadLoop. > > Would AgentThreadLoop be ok too? > >> You can add a comment before to explain some basic about what it is doing. > > Ok, done. > >> 167 printf("*** AGENT: GetLocalWithoutSuspendTestThreadLoop thread >> started. Polling thread '%s' for local variables\n", >> It is better to get rid of leading stars in all messages. > > Ok, done. > >> 176 // the native method >> Java_GetLocalWithoutSuspendTest_notifyAgentToGetLocalAndWaitShortly >> The part 'Java_GetLocalWithoutSuspendTest_' can be removed from the function >> name. > > Ok, done. > > --- > From: serguei.spit...@oracle.com <serguei.spit...@oracle.com> > Sent: Freitag, 14. August 2020 10:11 > To: Reingruber, Richard <richard.reingru...@sap.com>; David Holmes > <david.hol...@oracle.com>; serviceability-dev@openjdk.java.net > Subject: Re: RFR(S) 8249293: Unsafe stackwalk in > VM_GetOrSetLocal::doit_prologue() > > Hi Richard, > > http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java.frames.html > > 33 * the stack walk. The target thread's stack is walkable while > in native. After sending the notification it > ... > > 54 * @param depth Depth of target frame for GetLocalObject() call. > Should be large value to prolong the unsafe stack walk. > 55 * @param waitTimeInNativeAfterNotify Time to wait after notify > with walkable stack before returning an becoming unsafe again. > ... > > 71 * Wait time in native, i.e. with walkable stack, after notifying > agent thread to do GetLocalObject() call. > ... > > 89 msg((now -start) + " ms Iteration : " + iterations + " > waitTimeInNativeAfterNotify : " + waitTimeInNativeAfterNotify); > > Could you, please, re-balance the lines above to make them shorter? > > > 90 int newTargetDepth = recursiveMethod(0, targetDepth); > 91 if (newTargetDepth < targetDepth) { > 92 msg("StackOverflowError during test."); > 93 msg("Old target depth: " + targetDepth); > 94 msg("Retry with new target depth: " + newTargetDepth); > 95 targetDepth = newTargetDepth; > 96 } > A comment is needed to explain why a StackOverflowError is not desired. > At least, it is not obvious initially. > 73 public int waitTimeInNativeAfterNotify; > > This name is unreasonably long which makes the code less readable. > I'd suggest to reduce it to waitTime. > > 103 notifyAgentToGetLocalAndWaitShortly(-1, 1); > ... > 119 notifyAgentToGetLocalAndWaitShortly(depth - 100, > waitTimeInNativeAfterNotify); > > It is better to provide a short comment before each call explaining what it > is doing. > For instance, it is not clear why the call at the line 103 is needed. > Why do we need to notify the agent to GetLocal for the second time? > Can it be refactored into a separate native method? > Then the the function name can be reduced to 'notifyAgentToGetLocal'. > This long name does not give enough context anyway. > 85 long iterations = 0; > 87 do { > ... > 97 iterations++; > ... > 102 } while (iterations < TEST_ITERATIONS); > > Why a more explicit 'for' or 'while' loop is not used here? : > for (long iter = 0; iter < TEST_ITERATIONS; iter++) { > > > http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp.frames.html > > The indent in this file varies. It is better to keep it the same: 4 or 2. > > 60 AgentCallingGetLocalObject // The target thread waits for the agent > to call > I'd suggest to rename the constant to 'AgentInGetLocal'. > 150 GetLocalWithoutSuspendTestThreadLoop(jvmtiEnv * jvmti, JNIEnv* env, > void * arg) { > > It is better rename the function to TestThreadLoop. > You can add a comment before to explain some basic about what it is doing. > 167 printf("*** AGENT: GetLocalWithoutSuspendTestThreadLoop thread > started. Polling thread '%s' for local variables\n", > It is better to get rid of leading stars in all messages. > 176 // the native method > Java_GetLocalWithoutSuspendTest_notifyAgentToGetLocalAndWaitShortly > The part 'Java_GetLocalWithoutSuspendTest_' can be removed from the function > name. > > > I'm still reviewing the test native agent code. > > > Thanks, > Serguei > > > On 8/11/20 03:02, Reingruber, Richard wrote: > Hi David and Serguei, > > On 11/08/2020 3:21 am, mailto:serguei.spit...@oracle.com wrote: > Hi Richard and David, > > The implementation looks good to me. > > But I do not understand what the test is doing with all this counters > and recursions. > > For instance, these fragments: > > 86 recursions = 0; > 87 try { > 88 recursiveMethod(1<<20); > 89 } catch (StackOverflowError e) { > 90 msg("Caught StackOverflowError as expected"); > 91 } > 92 int target_depth = recursions-100; // spaces are missed > around the '-' sigh > > It is not obvious that the 'recursion' is updated in the recursiveMethod. > I would suggestto make it more explicit: > recursiveMethod(M); > int target_depth = M - 100; > > Then the variable 'recursions' can be removed or become local. > > The recursiveMethod takes in the maximum recursions to try and updates > the recursions variable to record how many recursions were possible - so: > > target_depth = <actual recursions> - 100; > > Possibly recursiveMethod could return the actual recursions instead of > using the global variables? > > I've eliminated the static 'recursions' variable. recursiveMethod() now > returns > the depth at which the recursion was ended. I hesitated doing this, because I > had to handle the StackOverflowError with all those frames still on stack. But > the handler is empty, so it should not cause problems. > > This is the new webrev (as posted previously): > > Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3/ > Delta: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3.inc/ > > Thanks, Richard. > > -----Original Message----- > From: David Holmes mailto:david.hol...@oracle.com > Sent: Dienstag, 11. August 2020 04:00 > To: mailto:serguei.spit...@oracle.com; Reingruber, Richard > mailto:richard.reingru...@sap.com; mailto:serviceability-dev@openjdk.java.net > Subject: Re: RFR(S) 8249293: Unsafe stackwalk in > VM_GetOrSetLocal::doit_prologue() > > Hi Serguei, > > On 11/08/2020 3:21 am, mailto:serguei.spit...@oracle.com wrote: > Hi Richard and David, > > The implementation looks good to me. > > But I do not understand what the test is doing with all this counters > and recursions. > > For instance, these fragments: > > 86 recursions = 0; > 87 try { > 88 recursiveMethod(1<<20); > 89 } catch (StackOverflowError e) { > 90 msg("Caught StackOverflowError as expected"); > 91 } > 92 int target_depth = recursions-100; // spaces are missed > around the '-' sigh > > It is not obvious that the 'recursion' is updated in the recursiveMethod. > I would suggestto make it more explicit: > recursiveMethod(M); > int target_depth = M - 100; > > Then the variable 'recursions' can be removed or become local. > > The recursiveMethod takes in the maximum recursions to try and updates > the recursions variable to record how many recursions were possible - so: > > target_depth = <actual recursions> - 100; > > Possibly recursiveMethod could return the actual recursions instead of > using the global variables? > > David > ----- > > This method will be: > > 47 private static final int M = 1 << 20; > ... > 121 public long recursiveMethod(int depth) { > 123 if (depth == 0) { > 124 notifyAgentToGetLocalAndWaitShortly(M - 100, > waitTimeInNativeAfterNotify); > 126 } else { > 127 recursiveMethod(--depth); > 128 } > 129 } > > > At least, he test is missing the comments explaining all these. > > Thanks, > Serguei > > > > On 8/9/20 22:35, David Holmes wrote: > Hi Richard, > > On 31/07/2020 5:28 pm, Reingruber, Richard wrote: > Hi, > > I rebase the fix after JDK-8250042. > > New webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.2/ > > The general fix for this seems good. A minor nit: > > 588 if (!is_assignable(signature, ob_k, Thread::current())) { > > You know that the current thread is the VMThread so can use > VMThread::vm_thread(). > > Similarly for this existing code: > > 694 Thread* current_thread = Thread::current(); > > --- > > Looking at the test code ... I'm less clear on exactly what is > happening and the use of spin-waits raises some red-flags for me in > terms of test reliability on different platforms. The "while > (--waitCycles > 0)" loop in particular offers no certainty that the > agent thread is executing anything in particular. And the use of the > spin_count as a guide to future waiting time seems somewhat arbitrary. > In all seriousness I got a headache trying to work out how the test > was expecting to operate. Some parts could be simplified using raw > monitors, I think. But there's no sure way to know the agent thread is > in the midst of the stackwalk when the target thread wants to leave > the native code. So I understand what you are trying to achieve here, > I'm just not sure how reliably it will actually achieve it. > > test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp > > > 32 static volatile jlong spinn_count = 0; > > Using a 64-bit counter seems like it will be a problem on 32-bit systems. > > Should be spin_count not spinn_count. > > 36 // Agent thread waits for value != 0, then performas the JVMTI > call to get local variable. > > typo: performas > > Thanks, > David > ----- > > Thanks, Richard. > > -----Original Message----- > From: serviceability-dev mailto:serviceability-dev-r...@openjdk.java.net > On Behalf Of Reingruber, Richard > Sent: Montag, 27. Juli 2020 09:45 > To: mailto:serguei.spit...@oracle.com; > mailto:serviceability-dev@openjdk.java.net > Subject: [CAUTION] RE: RFR(S) 8249293: Unsafe stackwalk in > VM_GetOrSetLocal::doit_prologue() > > Hi Serguei, > > > I tested it on Linux and Windows but not yet on MacOS. > > The test succeeded now on all platforms. > > Thanks, Richard. > > -----Original Message----- > From: Reingruber, Richard > Sent: Freitag, 24. Juli 2020 15:04 > To: mailto:serguei.spit...@oracle.com; > mailto:serviceability-dev@openjdk.java.net > Subject: RE: RFR(S) 8249293: Unsafe stackwalk in > VM_GetOrSetLocal::doit_prologue() > > Hi Serguei, > > The fix itself looks good to me. > > thanks for looking at the fix. > > I still need another look at new test. > Could you, please, convert the agent of new test to C++? > It will make it a little bit simpler. > > Sure, here is the new webrev.1 with a C++ version of the test agent: > > http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.1/ > > I tested it on Linux and Windows but not yet on MacOS. > > Thanks, > Richard. > > -----Original Message----- > From: mailto:serguei.spit...@oracle.com mailto:serguei.spit...@oracle.com > Sent: Freitag, 24. Juli 2020 00:00 > To: Reingruber, Richard mailto:richard.reingru...@sap.com; > mailto:serviceability-dev@openjdk.java.net > Subject: Re: RFR(S) 8249293: Unsafe stackwalk in > VM_GetOrSetLocal::doit_prologue() > > Hi Richard, > > Thank you for filing the CR and taking care about it! > The fix itself looks good to me. > I still need another look at new test. > Could you, please, convert the agent of new test to C++? > It will make it a little bit simpler. > > Thanks, > Serguei > > > On 7/20/20 01:15, Reingruber, Richard wrote: > Hi, > > please help review this fix for VM_GetOrSetLocal. It moves the > unsafe stackwalk from the vm > operation prologue before the safepoint into the doit() method > executed at the safepoint. > > Webrev: > http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.0/index.html > Bug: https://bugs.openjdk.java.net/browse/JDK-8249293 > > According to the JVMTI spec on local variable access it is not > required to suspend the target thread > T [1]. The operation will simply fail with > JVMTI_ERROR_NO_MORE_FRAMES if T is executing > bytecodes. It will succeed though if T is blocked because of > synchronization or executing some native > code. > > The issue is that in the latter case the stack walk in > VM_GetOrSetLocal::doit_prologue() to prepare > the access to the local variable is unsafe, because it is done > before the safepoint and it races > with T returning to execute bytecodes making its stack not walkable. > The included test shows that > this can crash the VM if T wins the race. > > Manual testing: > > - new test > test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java > - test/hotspot/jtreg/vmTestbase/nsk/jvmti > - test/hotspot/jtreg/serviceability/jvmti > > Nightly regression tests @SAP: JCK and JTREG, also in Xcomp mode, > SPECjvm2008, SPECjbb2015, > Renaissance Suite, SAP specific tests with fastdebug and release > builds on all platforms > > Thanks, Richard. > > [1] > https://docs.oracle.com/en/java/javase/14/docs/specs/jvmti.html#local > > >