Code review request: three native memory tracking related bugs
zhengyu.gu at oracle.com
Tue Jul 17 06:40:44 PDT 2012
> I don't understand your memory-management strategy in the MemSnapshot
> code. In the constructor you don't check if any of the allocations are
> successful. You check for null and bypass the action in some places (
> eg merge() wrt _staging_area), but assert that it is not null in
> others (eg. promote()), and don't check at all in others (eg
> print_snapshot_stats()). It is far from clear if those latter
> functions will never be reached if some of those values are null. Also
> if _lock is null, you will simply not lock! - which doesn't seem right.
As a requirement, NMT is a secondary functionality, which means that it
should not cause VM to exit when it encounters error.
In MemTracker::start() method, it uses 'nothrow new' to instantiate
MemSnapshot and performs out of memory checking, which checks
MemSnapshot object and all allocations inside constructor (snapshot ==
NULL || snapshot->out_of_memory()), if OOM is encountered, it shuts down
NMT. In short, when NMT state >= started, snapshot lock != NULL.
merge() merges recorders into _staging_area and _staging_area is
expandable, so OOM checking is needed. promote() promotes _staging_area
into snapshot, can only be reached when staging phase is successfully
completed, and _staging_area is read-only during promotion, that's why
OOM checking on _staging_area is not necessary, while any insertion
failures to snapshot indicates OOM condition, that also results shutting
Again, if _lock == null, NMT can never transition to started state, so
locking can never happen.
> It is difficult to gauge the full implications of using
> ThreadCritical. I would not be surprised is there is a potential
> problem with assertion failures while holding the ThreadCritical. But
> this is little different to other debug option combinations that can
> cause secondary failures/hangs during error reporting.
ThreadCritical is also used to block none JavaThread, if it is the case,
we have to rethink what lock is proper.
>> On 7/13/2012 3:43 PM, Zhengyu Gu wrote:
>>> 7181989: NMT ON: Assertion failure when NMT checks thread's native
>>> CR: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7181989
>>> Webrev: http://cr.openjdk.java.net/~zgu/7181989/webrev.00/
>>> We try to assert Thread's stack base to ensure that
>>> Thread::record_stack_base_and_size() to record native stack to NMT,
>>> but there is scenario that the thread fails to start, which no native
>>> stack is created and should not be asserted.
>>> 7181986: NMT ON: Assertion failure when running jdi
>>> CR: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7181986
>>> Webrev: http://cr.openjdk.java.net/~zgu/7181986/webrev.00/
>>> This is a racing condition when C/C++ runtime exit handler is ran
>>> before NMT worker thread exits. The exit handler calls _query_lock's
>>> destructor while NMT worker thread is still holding it. The fix is to
>>> make _query_lock a heap object, instead of static object, but the
>>> drawback is that, it does not seem that _query_lock can be safely
>>> Also, I reassigned MemSnaphot lock and query lock, so they participate
>>> in the deadlock detection logic.
>>> 7182543: NMT ON: Aggregate a few NMT related bugs
>>> Webrev: http://cr.openjdk.java.net/~zgu/7182543/webrev.00/
>>> - Fixed generations_in_used calculation
>>> - Wait MemRecorder instance count to drop to zero before completely
>>> shutdown NMT
>>> - Added assertion for JavaThread in _thread_blocked state.
More information about the hotspot-dev