Code review request: three native memory tracking related bugs
david.holmes at oracle.com
Tue Jul 17 16:30:11 PDT 2012
On 17/07/2012 11:40 PM, Zhengyu Gu wrote:
>> I don't understand your memory-management strategy in the MemSnapshot
>> code. In the constructor you don't check if any of the allocations are
>> successful. You check for null and bypass the action in some places (
>> eg merge() wrt _staging_area), but assert that it is not null in
>> others (eg. promote()), and don't check at all in others (eg
>> print_snapshot_stats()). It is far from clear if those latter
>> functions will never be reached if some of those values are null. Also
>> if _lock is null, you will simply not lock! - which doesn't seem right.
> As a requirement, NMT is a secondary functionality, which means that it
> should not cause VM to exit when it encounters error.
> In MemTracker::start() method, it uses 'nothrow new' to instantiate
> MemSnapshot and performs out of memory checking, which checks
> MemSnapshot object and all allocations inside constructor (snapshot ==
> NULL || snapshot->out_of_memory()), if OOM is encountered, it shuts down
> NMT. In short, when NMT state >= started, snapshot lock != NULL.
Thanks for clarifying that.
> merge() merges recorders into _staging_area and _staging_area is
> expandable, so OOM checking is needed. promote() promotes _staging_area
> into snapshot, can only be reached when staging phase is successfully
> completed, and _staging_area is read-only during promotion, that's why
> OOM checking on _staging_area is not necessary, while any insertion
> failures to snapshot indicates OOM condition, that also results shutting
> down NMT.
> Again, if _lock == null, NMT can never transition to started state, so
> locking can never happen.
>> It is difficult to gauge the full implications of using
>> ThreadCritical. I would not be surprised is there is a potential
>> problem with assertion failures while holding the ThreadCritical. But
>> this is little different to other debug option combinations that can
>> cause secondary failures/hangs during error reporting.
> ThreadCritical is also used to block none JavaThread, if it is the case,
> we have to rethink what lock is proper.
>>> On 7/13/2012 3:43 PM, Zhengyu Gu wrote:
>>>> 7181989: NMT ON: Assertion failure when NMT checks thread's native
>>>> CR: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7181989
>>>> Webrev: http://cr.openjdk.java.net/~zgu/7181989/webrev.00/
>>>> We try to assert Thread's stack base to ensure that
>>>> Thread::record_stack_base_and_size() to record native stack to NMT,
>>>> but there is scenario that the thread fails to start, which no native
>>>> stack is created and should not be asserted.
>>>> 7181986: NMT ON: Assertion failure when running jdi
>>>> CR: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7181986
>>>> Webrev: http://cr.openjdk.java.net/~zgu/7181986/webrev.00/
>>>> This is a racing condition when C/C++ runtime exit handler is ran
>>>> before NMT worker thread exits. The exit handler calls _query_lock's
>>>> destructor while NMT worker thread is still holding it. The fix is to
>>>> make _query_lock a heap object, instead of static object, but the
>>>> drawback is that, it does not seem that _query_lock can be safely
>>>> Also, I reassigned MemSnaphot lock and query lock, so they participate
>>>> in the deadlock detection logic.
>>>> 7182543: NMT ON: Aggregate a few NMT related bugs
>>>> Webrev: http://cr.openjdk.java.net/~zgu/7182543/webrev.00/
>>>> - Fixed generations_in_used calculation
>>>> - Wait MemRecorder instance count to drop to zero before completely
>>>> shutdown NMT
>>>> - Added assertion for JavaThread in _thread_blocked state.
More information about the hotspot-dev