Hi,

please review the following patch.

https://bugs.openjdk.java.net/browse/JDK-8080999
http://cr.openjdk.java.net/~thartmann/8080999/webrev.00/

Problem:
The compiler test [1] sets a maximum usage threshold for a code heap memory 
pool, allocates enough memory to hit the threshold and checks if the threshold 
counter was incremented (see 'CodeCacheUtils.hitUsageThreshold' [2]). On 
success, the allocated memory is freed and we start over again. The counter 
should only be incremented the first time the threshold is crossed. Subsequent 
threshold hits are only counted if the usage fell below the threshold value. 
The test times out waiting for the threshold counter to be incremented.

The problem is that there is a race between 
SensorInfo::set_gauge_sensor_level() which checks if the sensor should be 
triggered and SensorInfo::clear() which resets the sensor once the usage falls 
below the threshold value. The race occurs if 
SensorInfo::process_pending_requests() invokes SensorInfo::clear() because 
there is a pending request to clear the sensor. In the meantime 
SensorInfo::set_gauge_sensor_level() may be called because the sensor is 
triggered again. We now have to remove the pending clear request because the 
sensor should be activated (see [3]). However, since SensorInfo::clear() does 
not check the value of '_pending_clear_count' again, it still clears the 
sensor. The sensor may then be triggered again although the threshold value was 
only crossed once. As a result, the output of getUsageThresholdCount() is 
higher than expected.

Here is the detailed trace of events (the info in brackets is the output of 
SensorInfo::print):

...

[on count = 1 pending_triggers = 0 pending_clears = 0]

 WB.freeCodeBlob()
   LowMemoryDetector::detect_low_memory()
     SensorInfo::set_gauge_sensor_level()
       is_below_low ->
         _pending_clear_count++;

[on count = 1 pending_triggers = 0 pending_clears = 1]

 WB.allocateCodeBlob()
   LowMemoryDetector::detect_low_memory()
     SensorInfo::set_gauge_sensor_level()
       is_over_high -> 
         _pending_trigger_count++;
         _pending_clear_count = 0;  <-- remove pending clear request because 
the sensor is triggered again

[on count = 1 pending_triggers = 1 pending_clears = 0]
  
 LowMemoryDetector::process_sensor_changes()
   SensorInfo::process_pending_requests()  <-- still sees the old 
_pending_clear_count == 1 because there is a race
     SensorInfo::clear()

[off count = 2 pending_triggers = 0 pending_clears = 0]

 WB.fullGC()
   LowMemoryDetector::detect_low_memory()
     SensorInfo::set_gauge_sensor_level()
       is_over_high -> _pending_trigger_count++;

[off count = 2 pending_triggers = 1 pending_clears = 0]

At this point the threshold count should be 2 because the threshold was only 
hit once. However, we have a pending trigger request. Now there are tow 
possibilities:
1) We check the counter before the pending trigger is processed: We continue 
because as expected the count is 2 but we fail later because the overall count 
does not match the expected value (RuntimeException: Unexpected threshold usage 
count (assert failed: 11 == 10)).
2) We check after the pending trigger is processed: The counter is 3 and we 
time out waiting for it to be 2.

Solution:
I added asserts to the code to make sure that such race conditions are 
detected. I changed the implementation of SensorInfo::clear() to acquire the 
Service_lock and bail out if _pending_clear_count was reset in the meantime 
(i.e., if we lost a race to SensorInfo::set_gauge_sensor_level()). I also added 
a missing 

 335     _sensor_count += count;

to SensorInfo::clear() because it may trigger counter increments as well.

Testing:
- 15k runs of failing testcase
- JPRT

Thanks,
Tobias


[1] 
http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/da1c9ea76ce5/test/compiler/codecache/jmx/UsageThresholdExceededTest.java
[2] 
http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/da1c9ea76ce5/test/compiler/codecache/jmx/CodeCacheUtils.java#l50
[3] 
http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/da1c9ea76ce5/src/share/vm/services/lowMemoryDetector.cpp#l222

Reply via email to