Hi Dan,

On 26/08/2015 7:08 AM, Daniel D. Daugherty wrote:
Greetings,

I have a "fix" for a long standing race between JVM shutdown and the
JVM statistics subsystem:

JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
https://bugs.openjdk.java.net/browse/JDK-8049304

Webrev URL: http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/

Testing: Aurora Adhoc RT-SVC nightly batch
          Aurora Adhoc vm.tmtools batch
          Kim's repro sequence for JDK-8049304
          Kim's repro sequence for JDK-8129978
          JPRT -testset hotspot

This "fix":

- adds a volatile flag to record whether PerfDataManager is holding
   data (PerfData objects)
- adds PerfDataManager::has_PerfData() to return the flag
- changes the Java monitor subsystem's use of PerfData to
   check both allocation of the monitor subsystem specific
   PerfData object and the new PerfDataManager::has_PerfData()
   return value

If the global 'UsePerfData' option is false, the system works as
it did before. If 'UsePerfData' is true (the default on non-embedded
systems), the Java monitor subsystem will allocate a number of
PerfData objects to record information. The objects will record
information about Java monitor subsystem until the JVM shuts down.

When the JVM starts to shutdown, the new PerfDataManager flag will
change to false and the Java monitor subsystem will stop using the
PerfData objects. This is the new behavior. As noted in the comments
I added to the code, the race is still present; I'm just changing
the order and the timing to reduce the likelihood of the crash.

Right. To sum up: the basic problem is that the PerfData objects are deallocated at the safepoint established for VM termination, but those objects can actually be used by threads that are in a safepoint-safe state: in particular within the low-level synchronization code.

As you say this fix narrows the window where a crash can occur, but can not close it. If a thread is descheduled after the check of hasPerfData it can still access the PerfData object when it resumes, which may be after the object was deallocated. There's no true fix here without introducing synchronization (which would have to be even lower-level to avoid reentrant use of the same code we're fixing!) and the overhead of that would be prohibitive for these perf counters.

In response to Kim's concern about other code that uses PerfData objects I think you would have to examine those uses to see which, if any, can occur from either a non-JavaThread, or from within the code where a thread is considered safepoint-safe. I'm inclined to agree that given we have not seen issues with such code, either it does not exist or is extremely unlikely to hit this issue. Given the "fix" is itself only narrowing the window it doesn't seem necessary to address code that already has a narrower window.

That all said "leaking" the PerfData objects seems no less unpleasant a "fix". There are so many obstacles in the way of being able to unload and re-load the JVM that I do not think this makes the position measurably worse. In fact I can imagine that if we were to allow for such behaviour we would need to be able to terminate threads and reclaim all their resources (like Monitor instances), at which point it would also become easy to deallocate shared memory like PerfData objects.

I'll leave it up to you which way to go. As it stands this is Reviewed.

Thanks,
David

Thanks, in advance, for any comments, questions or suggestions.

Dan

Reply via email to