[CC'ing runtime for wider audience]
On 03.09.2015 11:33, Tobias Hartmann wrote: > Hi, > > please review the following patch. > > https://bugs.openjdk.java.net/browse/JDK-8080999 > http://cr.openjdk.java.net/~thartmann/8080999/webrev.00/ > > Problem: > The compiler test [1] sets a maximum usage threshold for a code heap memory > pool, allocates enough memory to hit the threshold and checks if the > threshold counter was incremented (see 'CodeCacheUtils.hitUsageThreshold' > [2]). On success, the allocated memory is freed and we start over again. The > counter should only be incremented the first time the threshold is crossed. > Subsequent threshold hits are only counted if the usage fell below the > threshold value. The test times out waiting for the threshold counter to be > incremented. > > The problem is that there is a race between > SensorInfo::set_gauge_sensor_level() which checks if the sensor should be > triggered and SensorInfo::clear() which resets the sensor once the usage > falls below the threshold value. The race occurs if > SensorInfo::process_pending_requests() invokes SensorInfo::clear() because > there is a pending request to clear the sensor. In the meantime > SensorInfo::set_gauge_sensor_level() may be called because the sensor is > triggered again. We now have to remove the pending clear request because the > sensor should be activated (see [3]). However, since SensorInfo::clear() does > not check the value of '_pending_clear_count' again, it still clears the > sensor. The sensor may then be triggered again although the threshold value > was only crossed once. As a result, the output of getUsageThresholdCount() is > higher than expected. > > Here is the detailed trace of events (the info in brackets is the output of > SensorInfo::print): > > ... > > [on count = 1 pending_triggers = 0 pending_clears = 0] > > WB.freeCodeBlob() > LowMemoryDetector::detect_low_memory() > SensorInfo::set_gauge_sensor_level() > is_below_low -> > _pending_clear_count++; > > [on count = 1 pending_triggers = 0 pending_clears = 1] > > WB.allocateCodeBlob() > LowMemoryDetector::detect_low_memory() > SensorInfo::set_gauge_sensor_level() > is_over_high -> > _pending_trigger_count++; > _pending_clear_count = 0; <-- remove pending clear request because > the sensor is triggered again > > [on count = 1 pending_triggers = 1 pending_clears = 0] > > LowMemoryDetector::process_sensor_changes() > SensorInfo::process_pending_requests() <-- still sees the old > _pending_clear_count == 1 because there is a race > SensorInfo::clear() > > [off count = 2 pending_triggers = 0 pending_clears = 0] > > WB.fullGC() > LowMemoryDetector::detect_low_memory() > SensorInfo::set_gauge_sensor_level() > is_over_high -> _pending_trigger_count++; > > [off count = 2 pending_triggers = 1 pending_clears = 0] > > At this point the threshold count should be 2 because the threshold was only > hit once. However, we have a pending trigger request. Now there are tow > possibilities: > 1) We check the counter before the pending trigger is processed: We continue > because as expected the count is 2 but we fail later because the overall > count does not match the expected value (RuntimeException: Unexpected > threshold usage count (assert failed: 11 == 10)). > 2) We check after the pending trigger is processed: The counter is 3 and we > time out waiting for it to be 2. > > Solution: > I added asserts to the code to make sure that such race conditions are > detected. I changed the implementation of SensorInfo::clear() to acquire the > Service_lock and bail out if _pending_clear_count was reset in the meantime > (i.e., if we lost a race to SensorInfo::set_gauge_sensor_level()). I also > added a missing > > 335 _sensor_count += count; > > to SensorInfo::clear() because it may trigger counter increments as well. > > Testing: > - 15k runs of failing testcase > - JPRT > > Thanks, > Tobias > > > [1] > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/da1c9ea76ce5/test/compiler/codecache/jmx/UsageThresholdExceededTest.java > [2] > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/da1c9ea76ce5/test/compiler/codecache/jmx/CodeCacheUtils.java#l50 > [3] > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/da1c9ea76ce5/src/share/vm/services/lowMemoryDetector.cpp#l222 >