Re: RFR (S) 8049304: race between VM_Exit and _sync_FutileWakeups->inc()

Daniel D. Daugherty Wed, 02 Sep 2015 10:20:55 -0700

Thanks!

Dan



On 9/2/15 11:06 AM, Tom Benson wrote:

Looks good to me!
Tnx,
Tom

On 9/2/2015 12:40 PM, Daniel D. Daugherty wrote:

Just for the record, here are the comment context diffs:

$ diff -c src/share/vm/runtime/perfMemory.cpp{.cr2,}***src/share/vm/runtime/perfMemory.cpp.cr2 Tue Sep 1 19:39:45 2015

--- src/share/vm/runtime/perfMemory.cpp Wed Sep  2 09:35:48 2015
***************
*** 70,76 ****
    // objects that are currently being used by running JavaThreads
    // or the StatSampler. This method is invoked while we are not at
    // a safepoint during a VM abort so leaving the PerfData objects
!   // around may also help diagnose the failure.
    //

if (SafepointSynchronize::is_at_safepoint() &&!StatSampler::is_active()) {

      PerfDataManager::destroy();
--- 70,78 ----
    // objects that are currently being used by running JavaThreads
    // or the StatSampler. This method is invoked while we are not at
    // a safepoint during a VM abort so leaving the PerfData objects
!   // around may also help diagnose the failure. In rare cases,
!   // PerfData objects are used in parallel with a safepoint. See
!   // the work around in PerfDataManager::destroy().
    //

if (SafepointSynchronize::is_at_safepoint() &&!StatSampler::is_active()) {

      PerfDataManager::destroy();


$ diff -c src/share/vm/runtime/objectMonitor.cpp{.cr2,}
*** src/share/vm/runtime/objectMonitor.cpp.cr2  Tue Sep  1 19:23:35 2015
--- src/share/vm/runtime/objectMonitor.cpp      Wed Sep  2 09:37:08 2015
***************
*** 572,577 ****
--- 572,579 ----

// That is by design - we trade "lossy" counters which areexposed to

      // races during updates for a lower probe effect.
      TEVENT(Inflated enter - Futile wakeup);

+ // This PerfData object can be used in a parallel with asafepoint.

+     // See the work around in PerfDataManager::destroy().
      OM_PERFDATA_OP(FutileWakeups, inc());
      ++nWakeups;

***************
*** 744,749 ****
--- 746,753 ----
      // *must* retry  _owner before parking.
      OrderAccess::fence();

+ // This PerfData object can be used in a parallel with asafepoint.

+     // See the work around in PerfDataManager::destroy().
      OM_PERFDATA_OP(FutileWakeups, inc());
    }


Dan


On 9/2/15 10:03 AM, Daniel D. Daugherty wrote:

On 9/2/15 9:40 AM, Tom Benson wrote:

Hi Dan,
OK. I didn't review what was added in round 1 once you said it wasall removed for round 2. 8^)


Not "all", but I did remove "most" of the round 1 changes :-)
The changes I kept are called in the list below.

It would be great if what you have in your first paragraph belowwas added to the comments. I think the existing comment inperfMemory_exit implies we're safe to remove the PerfData objectswithout fear of them being in use because we're at a safepoint.


I think I'll add this sentence to the comment in perfMemory_exit():

    // In rare cases, PerfData objects are used in parallel with a
    // safepoint. See the work around in PerfDataManager::destroy().

Perhaps better to have it (the new comment) inPerfDataManager::destroy(), because it seems so weird to have asleep in the VM thread during a safepoint, even in a shutdown path.


I think the PerfDataManager::destroy() comment is clear about
the race we're trying avoid. Again, if you have specific wording
changes to suggest to make it more clear... I'll take them. :-)

I think I'll also add this comment:

    // This PerfData object can be used in a parallel with a safepoint.
    // See the work around in PerfDataManager::destroy().

above these lines in src/share/vm/runtime/objectMonitor.cpp:

575     OM_PERFDATA_OP(FutileWakeups, inc());
747     OM_PERFDATA_OP(FutileWakeups, inc());

Any interest in asserting that you're at a safepoint inPerfDataManager::destroy? Just a thought.


I'd rather not add an assert() at this time.

Are you good with the above comment additions? Do you need to
see another webrev when I make those changes?

Dan

Tom

On 9/2/2015 11:15 AM, Daniel D. Daugherty wrote:

On 9/2/15 8:49 AM, Tom Benson wrote:

Hi,
I'm a bit confused on one point... Since you now only callPerfDataManager::destroy at a safepoint, why do you still havethe comment about 'the race' and the sleep?


Because the two "futile wakeup" counter updates in the monitor
subsystem can execute in parallel with a safepoint. The JavaThread
state is "blocked" so the safepoint subsystem will see the JavaThread
as "at a safepoint" when it is actually executing the code to
increment the counter.

That's what the "is_safe" parameter to the OM_PERFDATA_OP macro was
all about in the round 1 code review. However, David convinced me
that all that logic didn't guarantee we wouldn't hit the race so
I ripped it all out in the round 2 code review (this one).

Does this help your confusion?

Dan

Tom

On 9/2/2015 7:52 AM, Daniel D. Daugherty wrote:

David,

Thanks for the very fast re-review!

Enjoy your vacation!

Dan


On 9/2/15 2:54 AM, David Holmes wrote:

Hi Dan,

On 2/09/2015 2:45 PM, Daniel D. Daugherty wrote:

Greetings,

I've updated the "fix" for this bug based on code review comments
received in round 1. I've dropped most of the changes fromround 1
with a couple of exceptions.

I have no further comments - it all looks good to me. If otherswant the pendulum to swing back a little from this positionthen ... nothing that has been suggested is functionally wrong. :)


Thanks,
David
-----

PS. When you get back from vacation I'll be gone for a month.That gives you a large window to push other things through withless stress ;-)

JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
https://bugs.openjdk.java.net/browse/JDK-8049304

Webrev URL:http://cr.openjdk.java.net/~dcubed/8049304-webrev/2-jdk9-hs-rt/


The easiest way to re-review is to download the two patch files

(round 0 and round 2) and view them in your favorite filemerge tool:

http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/hotspot.patch

http://cr.openjdk.java.net/~dcubed/8049304-webrev/2-jdk9-hs-rt/hotspot.patch



Testing: Aurora Adhoc RT-SVC nightly batch (in process)
          Aurora Adhoc vm.tmtools batch (in process)
          Kim's repro sequence for JDK-8049304
          Kim's repro sequence for JDK-8129978
          JPRT -testset hotspot

Changes between round 0 and round 2:

- clarify a few comments
- init _has_PerfData flag with '0' (instead of false)
- drop unnecessary use OrderAccess::release_store() to set
   _has_PerfData to '1' (we're in a Mutex)

- change perfMemory_exit() to only callPerfDataManager::destroy()

   when called at a safepoint and when the StatSampler is not
   running; this means when the VM is aborting, we no longer have
   a race between the original crash report and this code path.

Changes between round 1 and round 2:

- clarify a few comments
- drop is_safe parameter to OM_PERFDATA_OP macro
- init _has_PerfData flag with '0' (instead of false)

- drop OrderAccess::fence() call beforeos::naked_short_sleep() call

- drop PerfDataManager::has_PerfData_with_acquire()
- drop unnecessary use OrderAccess::release_store() to set
   _has_PerfData to '1' (we're in a Mutex)

I believe that I've addressed all comments from round 0 and
from round 1.

Thanks, in advance, for any comments, questions or suggestions.

Dan


On 8/31/15 4:51 PM, Daniel D. Daugherty wrote:

Greetings,

I've updated the "fix" for this bug based on code reviewcomments

received in round 0.

JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
https://bugs.openjdk.java.net/browse/JDK-8049304

Webrev URL:
http://cr.openjdk.java.net/~dcubed/8049304-webrev/1-jdk9-hs-rt/

The easiest way to re-review is to download the two patch files
and view them in your favorite file merge tool:

http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/hotspot.patch

http://cr.openjdk.java.net/~dcubed/8049304-webrev/1-jdk9-hs-rt/hotspot.patch



Testing: Aurora Adhoc RT-SVC nightly batch (in process)
         Aurora Adhoc vm.tmtools batch (in process)
         Kim's repro sequence for JDK-8049304
         Kim's repro sequence for JDK-8129978
         JPRT -testset hotspot

Changes between round 0 and round 1:

- add an 'is_safe' parameter to the OM_PERFDATA_OP macro;
  safepoint-safe callers can access _has_PerfData flag directly;
  non-safepoint-safe callers use a load-acquire to fetch the
  current _has_PerfData flag value
- change PerfDataManager::destroy() to simply set _has_PerfData
  to zero (field is volatile) and then use a fence() to prevent
  any reordering of operations in any direction; it's only done
  once during VM shutdown so...

- change perfMemory_exit() to only callPerfDataManager::destroy()

  when called at a safepoint and when the StatSample is not
  running; this means when the VM is aborting, we no longer have
  a race between the original crash report and this code path.

I believe that I've addressed all comments from round 0.

Thanks, in advance, for any comments, questions or suggestions.

Dan


On 8/25/15 3:08 PM, Daniel D. Daugherty wrote:

Greetings,
I have a "fix" for a long standing race between JVM shutdownand the
JVM statistics subsystem:

JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
https://bugs.openjdk.java.net/browse/JDK-8049304

Webrev URL:
http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/

Testing: Aurora Adhoc RT-SVC nightly batch
         Aurora Adhoc vm.tmtools batch
         Kim's repro sequence for JDK-8049304
         Kim's repro sequence for JDK-8129978
         JPRT -testset hotspot

This "fix":
- adds a volatile flag to record whether PerfDataManager isholding
  data (PerfData objects)
- adds PerfDataManager::has_PerfData() to return the flag
- changes the Java monitor subsystem's use of PerfData to
  check both allocation of the monitor subsystem specific
  PerfData object and the new PerfDataManager::has_PerfData()
  return value
If the global 'UsePerfData' option is false, the systemworks asit did before. If 'UsePerfData' is true (the default onnon-embedded
systems), the Java monitor subsystem will allocate a number of
PerfData objects to record information. The objects will record
information about Java monitor subsystem until the JVM shutsdown.
When the JVM starts to shutdown, the new PerfDataManagerflag willchange to false and the Java monitor subsystem will stopusing thePerfData objects. This is the new behavior. As noted in thecommentsI added to the code, the race is still present; I'm justchanging
the order and the timing to reduce the likelihood of the crash.

Thanks, in advance, for any comments, questions or suggestions.

Dan

Re: RFR (S) 8049304: race between VM_Exit and _sync_FutileWakeups->inc()

Reply via email to