RE: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()

Reingruber, Richard Tue, 11 Aug 2020 02:54:10 -0700

Hi Serguei,

> The implementation looks good to me.

Thanks.

> But I do not understand what the test is doing with all this counters and 
> recursions.

The @comment gives an explanation: the target thread builds a stack as large as
possible to prolong the unsafe stackwalk. This is done by means of recursion.

> For instance, these fragments:
>   86         recursions = 0;
>   87         try {
>   88             recursiveMethod(1<<20);
>   89         } catch (StackOverflowError e) {
>   90             msg("Caught StackOverflowError as expected");
>   91         }
>   92         int target_depth = recursions-100;    // spaces are missed 
> around the '-' sigh

Here we calibrate the test for maximum stack depth. We measure how large the
stack can grow by calling recursiveMethod() until a StackOverflowError is
raised. We use recursions - 100 as target_depth to avoid the StackOverflowError
during the actual test.

> It is not obvious that the 'recursion' is updated in the recursiveMethod.
> I would suggestto make it more explicit:
>    recursiveMethod(M);
>    int target_depth = M - 100;

> Then the variable 'recursions' can be removed or become local.

> This method will be:
>   47     private static final int M = 1 << 20;
>  ...
>  121     public long recursiveMethod(int depth) {
>  123         if (depth == 0) {
>  124             notifyAgentToGetLocalAndWaitShortly(M - 100, 
> waitTimeInNativeAfterNotify);
>  126         } else {
>  127             recursiveMethod(--depth);
>  128         }
>  129     }

I don't think this would work. A StackOverflowError would be raised
before notifyAgentToGetLocalAndWaitShortly() is called.

> At least, he test is missing the comments explaining all these.

The arguments to the msg() method serve as documentation too. I would not want
to repeat the msg strings in comments.

Thanks, Richard.

-------------------------------------------------------------
From: [email protected] <[email protected]> 
Sent: Montag, 10. August 2020 19:22
To: David Holmes <[email protected]>; Reingruber, Richard 
<[email protected]>; [email protected]
Subject: Re: RFR(S) 8249293: Unsafe stackwalk in 
VM_GetOrSetLocal::doit_prologue()

Hi Richard and David,

The implementation looks good to me.

But I do not understand what the test is doing with all this counters and 
recursions.

For instance, these fragments:
  86         recursions = 0;
  87         try {
  88             recursiveMethod(1<<20);
  89         } catch (StackOverflowError e) {
  90             msg("Caught StackOverflowError as expected");
  91         }
  92         int target_depth = recursions-100;    // spaces are missed around 
the '-' sigh

It is not obvious that the 'recursion' is updated in the recursiveMethod.
I would suggestto make it more explicit:
   recursiveMethod(M);
   int target_depth = M - 100;

Then the variable 'recursions' can be removed or become local.

This method will be:
  47     private static final int M = 1 << 20;
 ...
 121     public long recursiveMethod(int depth) {
 123         if (depth == 0) {
 124             notifyAgentToGetLocalAndWaitShortly(M - 100, 
waitTimeInNativeAfterNotify);
 126         } else {
 127             recursiveMethod(--depth);
 128         }
 129     }

At least, he test is missing the comments explaining all these.

Thanks,
Serguei

On 8/9/20 22:35, David Holmes wrote:
Hi Richard, 

On 31/07/2020 5:28 pm, Reingruber, Richard wrote: 

Hi, 

I rebase the fix after JDK-8250042. 

New webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.2/ 

The general fix for this seems good. A minor nit: 

 588     if (!is_assignable(signature, ob_k, Thread::current())) { 

You know that the current thread is the VMThread so can use 
VMThread::vm_thread(). 

Similarly for this existing code: 

 694     Thread* current_thread = Thread::current(); 

--- 

Looking at the test code ... I'm less clear on exactly what is happening and 
the use of spin-waits raises some red-flags for me in terms of test reliability 
on different platforms. The "while (--waitCycles > 0)" loop in particular 
offers no certainty that the agent thread is executing anything in particular. 
And the use of the spin_count as a guide to future waiting time seems somewhat 
arbitrary. In all seriousness I got a headache trying to work out how the test 
was expecting to operate. Some parts could be simplified using raw monitors, I 
think. But there's no sure way to know the agent thread is in the midst of the 
stackwalk when the target thread wants to leave the native code. So I 
understand what you are trying to achieve here, I'm just not sure how reliably 
it will actually achieve it. 

test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp

 32 static volatile jlong spinn_count     = 0; 

Using a 64-bit counter seems like it will be a problem on 32-bit systems. 

Should be spin_count not spinn_count. 

 36 // Agent thread waits for value != 0, then performas the JVMTI call to get 
local variable. 

typo: performas 

Thanks, 
David 
----- 

Thanks, Richard. 

-----Original Message----- 
From: serviceability-dev mailto:[email protected] On 
Behalf Of Reingruber, Richard 
Sent: Montag, 27. Juli 2020 09:45 
To: mailto:[email protected]; 
mailto:[email protected] 
Subject: [CAUTION] RE: RFR(S) 8249293: Unsafe stackwalk in 
VM_GetOrSetLocal::doit_prologue() 

Hi Serguei, 

   > I tested it on Linux and Windows but not yet on MacOS. 

The test succeeded now on all platforms. 

Thanks, Richard. 

-----Original Message----- 
From: Reingruber, Richard 
Sent: Freitag, 24. Juli 2020 15:04 
To: mailto:[email protected]; 
mailto:[email protected] 
Subject: RE: RFR(S) 8249293: Unsafe stackwalk in 
VM_GetOrSetLocal::doit_prologue() 

Hi Serguei, 

The fix itself looks good to me. 

thanks for looking at the fix. 

I still need another look at new test. 
Could you, please, convert the agent of new test to C++? 
It will make it a little bit simpler. 

Sure, here is the new webrev.1 with a C++ version of the test agent: 

http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.1/ 

I tested it on Linux and Windows but not yet on MacOS. 

Thanks, 
Richard. 

-----Original Message----- 
From: mailto:[email protected] mailto:[email protected] 
Sent: Freitag, 24. Juli 2020 00:00 
To: Reingruber, Richard mailto:[email protected]; 
mailto:[email protected] 
Subject: Re: RFR(S) 8249293: Unsafe stackwalk in 
VM_GetOrSetLocal::doit_prologue() 

Hi Richard, 

Thank you for filing the CR and taking care about it! 
The fix itself looks good to me. 
I still need another look at new test. 
Could you, please, convert the agent of new test to C++? 
It will make it a little bit simpler. 

Thanks, 
Serguei 

On 7/20/20 01:15, Reingruber, Richard wrote: 

Hi, 

please help review this fix for VM_GetOrSetLocal. It moves the unsafe stackwalk 
from the vm 
operation prologue before the safepoint into the doit() method executed at the 
safepoint. 

Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.0/index.html 
Bug:    https://bugs.openjdk.java.net/browse/JDK-8249293 

According to the JVMTI spec on local variable access it is not required to 
suspend the target thread 
T [1]. The operation will simply fail with JVMTI_ERROR_NO_MORE_FRAMES if T is 
executing 
bytecodes. It will succeed though if T is blocked because of synchronization or 
executing some native 
code. 

The issue is that in the latter case the stack walk in 
VM_GetOrSetLocal::doit_prologue() to prepare 
the access to the local variable is unsafe, because it is done before the 
safepoint and it races 
with T returning to execute bytecodes making its stack not walkable. The 
included test shows that 
this can crash the VM if T wins the race. 

Manual testing: 

    - new test 
test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java
    - test/hotspot/jtreg/vmTestbase/nsk/jvmti 
    - test/hotspot/jtreg/serviceability/jvmti 

Nightly regression tests @SAP: JCK and JTREG, also in Xcomp mode, SPECjvm2008, 
SPECjbb2015, 
Renaissance Suite, SAP specific tests with fastdebug and release builds on all 
platforms 

Thanks, Richard. 

[1] https://docs.oracle.com/en/java/javase/14/docs/specs/jvmti.html#local

RE: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()

Reply via email to