Re: RFR (S) 8049304: race between VM_Exit and _sync_FutileWakeups->inc()

David Holmes Mon, 31 Aug 2015 23:15:07 -0700

Hi Dan,

Hope you had that coffee :)


On 1/09/2015 8:51 AM, Daniel D. Daugherty wrote:

Greetings,

I've updated the "fix" for this bug based on code review comments
received in round 0.

JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
https://bugs.openjdk.java.net/browse/JDK-8049304

Webrev URL: http://cr.openjdk.java.net/~dcubed/8049304-webrev/1-jdk9-hs-rt/

The easiest way to re-review is to download the two patch files
and view them in your favorite file merge tool:

http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/hotspot.patch

http://cr.openjdk.java.net/~dcubed/8049304-webrev/1-jdk9-hs-rt/hotspot.patch


Testing: Aurora Adhoc RT-SVC nightly batch (in process)
          Aurora Adhoc vm.tmtools batch (in process)
          Kim's repro sequence for JDK-8049304
          Kim's repro sequence for JDK-8129978
          JPRT -testset hotspot

Changes between round 0 and round 1:

- add an 'is_safe' parameter to the OM_PERFDATA_OP macro;

The sense of safepoint-safe is a bit confusing to me. If the thread isin a safepoint-safe-state then it seems safepoint-safe==false ??

   safepoint-safe callers can access _has_PerfData flag directly;
   non-safepoint-safe callers use a load-acquire to fetch the
   current _has_PerfData flag value

load-acquire makes no difference to the "currentness" of _has_PerfDataand has no affect on the size of the "race window". So I don't see anypoint to this is-safe distinction.

- change PerfDataManager::destroy() to simply set _has_PerfData
   to zero (field is volatile) and then use a fence() to prevent
   any reordering of operations in any direction; it's only done
   once during VM shutdown so...


See previous email :)

- change perfMemory_exit() to only call PerfDataManager::destroy()
   when called at a safepoint and when the StatSample is not
   running; this means when the VM is aborting, we no longer have
   a race between the original crash report and this code path.


Sounds reasonable.

I have few specific comments/nits:

src/share/vm/runtime/perfData.cpp

 43 volatile jint   PerfDataManager::_has_PerfData = false;

Nit: jint should be initialized with 0 not false.

 287   // manipulation before we free the memory. The two alternatives
 288   // are 1) leak the PerfData memory or 2) do some form of ordered
 289   // access before every PerfData operation.

No amount of ordered access will fix the race - only synchronization ofsome form.


 315 void PerfDataManager::add_item(PerfData* p, bool sampled) {
 316
 317   MutexLocker ml(PerfDataManager_lock);
 318
 319   if (_all == NULL) {
 320     _all = new PerfDataList(100);
 321     OrderAccess::release_store(&_has_PerfData, 1);
 322   }

I can't see any purpose for a release_store here. You have one-timeinitialization code guarded with a mutex. A release_store adds nothinghere - other than future head-scratching from maintainers of this code.


Thanks,
David

I believe that I've addressed all comments from round 0.

Thanks, in advance, for any comments, questions or suggestions.

Dan


On 8/25/15 3:08 PM, Daniel D. Daugherty wrote:

Greetings,

I have a "fix" for a long standing race between JVM shutdown and the
JVM statistics subsystem:

JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
https://bugs.openjdk.java.net/browse/JDK-8049304

Webrev URL:
http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/

Testing: Aurora Adhoc RT-SVC nightly batch
         Aurora Adhoc vm.tmtools batch
         Kim's repro sequence for JDK-8049304
         Kim's repro sequence for JDK-8129978
         JPRT -testset hotspot

This "fix":

- adds a volatile flag to record whether PerfDataManager is holding
  data (PerfData objects)
- adds PerfDataManager::has_PerfData() to return the flag
- changes the Java monitor subsystem's use of PerfData to
  check both allocation of the monitor subsystem specific
  PerfData object and the new PerfDataManager::has_PerfData()
  return value

If the global 'UsePerfData' option is false, the system works as
it did before. If 'UsePerfData' is true (the default on non-embedded
systems), the Java monitor subsystem will allocate a number of
PerfData objects to record information. The objects will record
information about Java monitor subsystem until the JVM shuts down.

When the JVM starts to shutdown, the new PerfDataManager flag will
change to false and the Java monitor subsystem will stop using the
PerfData objects. This is the new behavior. As noted in the comments
I added to the code, the race is still present; I'm just changing
the order and the timing to reduce the likelihood of the crash.

Thanks, in advance, for any comments, questions or suggestions.

Dan

Re: RFR (S) 8049304: race between VM_Exit and _sync_FutileWakeups->inc()

Reply via email to