[9] RFR(S): 8080999: MemoryPoolMXBean.getUsageThresholdCount() returns incorrect value

Tobias Hartmann tobias.hartmann at oracle.com
Thu Sep 3 09:33:49 UTC 2015


please review the following patch.


The compiler test [1] sets a maximum usage threshold for a code heap memory pool, allocates enough memory to hit the threshold and checks if the threshold counter was incremented (see 'CodeCacheUtils.hitUsageThreshold' [2]). On success, the allocated memory is freed and we start over again. The counter should only be incremented the first time the threshold is crossed. Subsequent threshold hits are only counted if the usage fell below the threshold value. The test times out waiting for the threshold counter to be incremented.

The problem is that there is a race between SensorInfo::set_gauge_sensor_level() which checks if the sensor should be triggered and SensorInfo::clear() which resets the sensor once the usage falls below the threshold value. The race occurs if SensorInfo::process_pending_requests() invokes SensorInfo::clear() because there is a pending request to clear the sensor. In the meantime SensorInfo::set_gauge_sensor_level() may be called because the sensor is triggered again. We now have to remove the pending clear request because the sensor should be activated (see [3]). However, since SensorInfo::clear() does not check the value of '_pending_clear_count' again, it still clears the sensor. The sensor may then be triggered again although the threshold value was only crossed once. As a result, the output of getUsageThresholdCount() is higher than expected.

Here is the detailed trace of events (the info in brackets is the output of SensorInfo::print):


[on count = 1 pending_triggers = 0 pending_clears = 0]

       is_below_low ->

[on count = 1 pending_triggers = 0 pending_clears = 1]

       is_over_high -> 
         _pending_clear_count = 0;  <-- remove pending clear request because the sensor is triggered again

[on count = 1 pending_triggers = 1 pending_clears = 0]
   SensorInfo::process_pending_requests()  <-- still sees the old _pending_clear_count == 1 because there is a race

[off count = 2 pending_triggers = 0 pending_clears = 0]

       is_over_high -> _pending_trigger_count++;

[off count = 2 pending_triggers = 1 pending_clears = 0]

At this point the threshold count should be 2 because the threshold was only hit once. However, we have a pending trigger request. Now there are tow possibilities:
1) We check the counter before the pending trigger is processed: We continue because as expected the count is 2 but we fail later because the overall count does not match the expected value (RuntimeException: Unexpected threshold usage count (assert failed: 11 == 10)).
2) We check after the pending trigger is processed: The counter is 3 and we time out waiting for it to be 2.

I added asserts to the code to make sure that such race conditions are detected. I changed the implementation of SensorInfo::clear() to acquire the Service_lock and bail out if _pending_clear_count was reset in the meantime (i.e., if we lost a race to SensorInfo::set_gauge_sensor_level()). I also added a missing 

 335     _sensor_count += count;

to SensorInfo::clear() because it may trigger counter increments as well.

- 15k runs of failing testcase


[1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/da1c9ea76ce5/test/compiler/codecache/jmx/UsageThresholdExceededTest.java
[2] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/da1c9ea76ce5/test/compiler/codecache/jmx/CodeCacheUtils.java#l50
[3] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/da1c9ea76ce5/src/share/vm/services/lowMemoryDetector.cpp#l222

More information about the hotspot-dev mailing list