[CC'ing runtime for wider audience]

On 03.09.2015 11:33, Tobias Hartmann wrote:
> Hi,
> 
> please review the following patch.
> 
> https://bugs.openjdk.java.net/browse/JDK-8080999
> http://cr.openjdk.java.net/~thartmann/8080999/webrev.00/
> 
> Problem:
> The compiler test [1] sets a maximum usage threshold for a code heap memory 
> pool, allocates enough memory to hit the threshold and checks if the 
> threshold counter was incremented (see 'CodeCacheUtils.hitUsageThreshold' 
> [2]). On success, the allocated memory is freed and we start over again. The 
> counter should only be incremented the first time the threshold is crossed. 
> Subsequent threshold hits are only counted if the usage fell below the 
> threshold value. The test times out waiting for the threshold counter to be 
> incremented.
> 
> The problem is that there is a race between 
> SensorInfo::set_gauge_sensor_level() which checks if the sensor should be 
> triggered and SensorInfo::clear() which resets the sensor once the usage 
> falls below the threshold value. The race occurs if 
> SensorInfo::process_pending_requests() invokes SensorInfo::clear() because 
> there is a pending request to clear the sensor. In the meantime 
> SensorInfo::set_gauge_sensor_level() may be called because the sensor is 
> triggered again. We now have to remove the pending clear request because the 
> sensor should be activated (see [3]). However, since SensorInfo::clear() does 
> not check the value of '_pending_clear_count' again, it still clears the 
> sensor. The sensor may then be triggered again although the threshold value 
> was only crossed once. As a result, the output of getUsageThresholdCount() is 
> higher than expected.
> 
> Here is the detailed trace of events (the info in brackets is the output of 
> SensorInfo::print):
> 
> ...
> 
> [on count = 1 pending_triggers = 0 pending_clears = 0]
> 
>  WB.freeCodeBlob()
>    LowMemoryDetector::detect_low_memory()
>      SensorInfo::set_gauge_sensor_level()
>        is_below_low ->
>          _pending_clear_count++;
> 
> [on count = 1 pending_triggers = 0 pending_clears = 1]
> 
>  WB.allocateCodeBlob()
>    LowMemoryDetector::detect_low_memory()
>      SensorInfo::set_gauge_sensor_level()
>        is_over_high -> 
>          _pending_trigger_count++;
>          _pending_clear_count = 0;  <-- remove pending clear request because 
> the sensor is triggered again
> 
> [on count = 1 pending_triggers = 1 pending_clears = 0]
>   
>  LowMemoryDetector::process_sensor_changes()
>    SensorInfo::process_pending_requests()  <-- still sees the old 
> _pending_clear_count == 1 because there is a race
>      SensorInfo::clear()
> 
> [off count = 2 pending_triggers = 0 pending_clears = 0]
> 
>  WB.fullGC()
>    LowMemoryDetector::detect_low_memory()
>      SensorInfo::set_gauge_sensor_level()
>        is_over_high -> _pending_trigger_count++;
> 
> [off count = 2 pending_triggers = 1 pending_clears = 0]
> 
> At this point the threshold count should be 2 because the threshold was only 
> hit once. However, we have a pending trigger request. Now there are tow 
> possibilities:
> 1) We check the counter before the pending trigger is processed: We continue 
> because as expected the count is 2 but we fail later because the overall 
> count does not match the expected value (RuntimeException: Unexpected 
> threshold usage count (assert failed: 11 == 10)).
> 2) We check after the pending trigger is processed: The counter is 3 and we 
> time out waiting for it to be 2.
> 
> Solution:
> I added asserts to the code to make sure that such race conditions are 
> detected. I changed the implementation of SensorInfo::clear() to acquire the 
> Service_lock and bail out if _pending_clear_count was reset in the meantime 
> (i.e., if we lost a race to SensorInfo::set_gauge_sensor_level()). I also 
> added a missing 
> 
>  335     _sensor_count += count;
> 
> to SensorInfo::clear() because it may trigger counter increments as well.
> 
> Testing:
> - 15k runs of failing testcase
> - JPRT
> 
> Thanks,
> Tobias
> 
> 
> [1] 
> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/da1c9ea76ce5/test/compiler/codecache/jmx/UsageThresholdExceededTest.java
> [2] 
> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/da1c9ea76ce5/test/compiler/codecache/jmx/CodeCacheUtils.java#l50
> [3] 
> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/da1c9ea76ce5/src/share/vm/services/lowMemoryDetector.cpp#l222
> 

Reply via email to