Re: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()

serguei.spit...@oracle.com Mon, 10 Aug 2020 10:25:23 -0700

Hi Richard and David,

The implementation looks good to me.

But I do not understand what the test is doing with all this counters and recursions.

For instance, these fragments:

  86         recursions = 0;
  87         try {
  88             recursiveMethod(1<<20);
  89         } catch (StackOverflowError e) {
  90             msg("Caught StackOverflowError as expected");
  91         }
  92         int target_depth = recursions-100;    // spaces are missed around the '-' sigh

It is not obvious that the 'recursion' is updated in the recursiveMethod.
I would suggestto make it more explicit:
recursiveMethod(M);
int target_depth = M - 100;

Then the variable 'recursions' can be removed or become local.

This method will be:

  47     private static final int M = 1 << 20;
 ...
 121     public long recursiveMethod(int depth) {
 123         if (depth == 0) {
 124             notifyAgentToGetLocalAndWaitShortly(M - 100, waitTimeInNativeAfterNotify);
 126         } else {
 127             recursiveMethod(--depth);
 128         }
 129     }

At least, he test is missing the comments explaining all these.

Thanks,
Serguei

On 8/9/20 22:35, David Holmes wrote:

Hi Richard,

On 31/07/2020 5:28 pm, Reingruber, Richard wrote:

Hi,

I rebase the fix after JDK-8250042.

New webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.2/

The general fix for this seems good. A minor nit:

588     if (!is_assignable(signature, ob_k, Thread::current())) {

You know that the current thread is the VMThread so can use VMThread::vm_thread().

Similarly for this existing code:

694     Thread* current_thread = Thread::current();

---

Looking at the test code ... I'm less clear on exactly what is happening and the use of spin-waits raises some red-flags for me in terms of test reliability on different platforms. The "while (--waitCycles > 0)" loop in particular offers no certainty that the agent thread is executing anything in particular. And the use of the spin_count as a guide to future waiting time seems somewhat arbitrary. In all seriousness I got a headache trying to work out how the test was expecting to operate. Some parts could be simplified using raw monitors, I think. But there's no sure way to know the agent thread is in the midst of the stackwalk when the target thread wants to leave the native code. So I understand what you are trying to achieve here, I'm just not sure how reliably it will actually achieve it.

test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp

32 static volatile jlong spinn_count     = 0;

Using a 64-bit counter seems like it will be a problem on 32-bit systems.

Should be spin_count not spinn_count.

36 // Agent thread waits for value != 0, then performas the JVMTI call to get local variable.

typo: performas

Thanks,
David
-----

Thanks, Richard.

-----Original Message-----
From: serviceability-dev <serviceability-dev-r...@openjdk.java.net> On Behalf Of Reingruber, Richard
Sent: Montag, 27. Juli 2020 09:45
To: serguei.spit...@oracle.com; serviceability-dev@openjdk.java.net
Subject: [CAUTION] RE: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()

Hi Serguei,

   > I tested it on Linux and Windows but not yet on MacOS.

The test succeeded now on all platforms.

Thanks, Richard.

-----Original Message-----
From: Reingruber, Richard
Sent: Freitag, 24. Juli 2020 15:04
To: serguei.spit...@oracle.com; serviceability-dev@openjdk.java.net
Subject: RE: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()

Hi Serguei,

The fix itself looks good to me.

thanks for looking at the fix.

I still need another look at new test.
Could you, please, convert the agent of new test to C++?
It will make it a little bit simpler.

Sure, here is the new webrev.1 with a C++ version of the test agent:

http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.1/

I tested it on Linux and Windows but not yet on MacOS.

Thanks,
Richard.

-----Original Message-----
From: serguei.spit...@oracle.com <serguei.spit...@oracle.com>
Sent: Freitag, 24. Juli 2020 00:00
To: Reingruber, Richard <richard.reingru...@sap.com>; serviceability-dev@openjdk.java.net
Subject: Re: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()

Hi Richard,

Thank you for filing the CR and taking care about it!
The fix itself looks good to me.
I still need another look at new test.
Could you, please, convert the agent of new test to C++?
It will make it a little bit simpler.

Thanks,
Serguei

On 7/20/20 01:15, Reingruber, Richard wrote:

Hi,

please help review this fix for VM_GetOrSetLocal. It moves the unsafe stackwalk from the vm
operation prologue before the safepoint into the doit() method executed at the safepoint.

Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.0/index.html
Bug:    https://bugs.openjdk.java.net/browse/JDK-8249293

According to the JVMTI spec on local variable access it is not required to suspend the target thread
T [1]. The operation will simply fail with JVMTI_ERROR_NO_MORE_FRAMES if T is executing
bytecodes. It will succeed though if T is blocked because of synchronization or executing some native
code.

The issue is that in the latter case the stack walk in VM_GetOrSetLocal::doit_prologue() to prepare
the access to the local variable is unsafe, because it is done before the safepoint and it races
with T returning to execute bytecodes making its stack not walkable. The included test shows that
this can crash the VM if T wins the race.

Manual testing:

    - new test test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java
    - test/hotspot/jtreg/vmTestbase/nsk/jvmti
    - test/hotspot/jtreg/serviceability/jvmti

Nightly regression tests @SAP: JCK and JTREG, also in Xcomp mode, SPECjvm2008, SPECjbb2015,
Renaissance Suite, SAP specific tests with fastdebug and release builds on all platforms

Thanks, Richard.

[1] https://docs.oracle.com/en/java/javase/14/docs/specs/jvmti.html#local

Re: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()

Reply via email to