[9] RFR (S): 8173151: Code heap corruption due to incorrect inclusion test

dean.long at oracle.com dean.long at oracle.com
Mon Feb 6 23:14:08 UTC 2017

When do we allocate a CodeBlob with a code size of 0?  Is it really 
useful?  Would having a minimum code size of 1 fix the problem?


On 2/6/17 7:29 AM, Zoltán Majó wrote:
> Hi,
> please review the fix for 8173151.
> https://bugs.openjdk.java.net/browse/JDK-8173151
> http://cr.openjdk.java.net/~zmajo/8173151/webrev.00/
> The crash reported in the bug is caused by the corruption of the 
> 'non-profiled nmethods' code heap. CodeHeap::_freelist for that code 
> heap points to an address one segment before the heap's address space. 
> The sweeper starts iterating through the code heap from the beginning 
> of the heap's address space. Thus, the sweeper assumes that the first 
> item in the code heap is a HeapBlock/FreeBlock (with the appropriate 
> length and usage information). However, that is not the case, as the 
> first item in the heap is actually *before* the heap. So the sweeper 
> crashes.
> This is a hard-to-reproduce problem (I managed to reproduce it only 
> once in 350 iterations, each iteration taking ~25 minutes). So the fix 
> I propose is based on core file debugging and source code 
> investigation. But I managed to write a regression test that triggers 
> a problem similar to the original problem.
> I think that problem happens because a CodeBlob allocated in one code 
> heap (A) is returned to a different code heap (B). When the CodeBlob 
> is returned B, it is added to B's freelist. However, as the CodeBlob 
> was allocated in A, the freelist of B now points into A (i.e., B is 
> corrupted).
> The code cache uses CodeCache::get_code_heap(const CodeBlob* cb) to 
> determine to which code heap a 'cb' is supposed to be returned to. 
> Since 8171008 (AOT) [1], the check is:
> CodeHeap* CodeCache::get_code_heap(const CodeBlob* cb) {
> assert(cb != NULL, "CodeBlob is null");
> FOR_ALL_HEAPS(heap) {
> - if ((*heap)->contains(cb)) {
> + if ((*heap)->contains(cb->code_begin())) {
> return *heap;
> }
> }
> The blob 'cb' can be returned to the wrong heap if, for example:
> - 'cb' is at the end code heap A and
> - the size of the code contained in 'cb' is 0 (therefore code_begin() 
> returns the address after 'cb', i.e., the first address in code heap B).
> The fix proposes to restore CodeCache::get_code_heap() to its pre-AOT 
> state (as I'm not aware of the reasons why AOT changed that check). I 
> also propose to add some guarantees after allocation/deallocation in 
> the code heap to possibly easier catch this or related problems in the 
> future.
> The regression test I propose achieves the above condition and results 
> in a crash. The regression test works only with product builds, 
> because in a product build a BufferBlob fits into one segment whereas 
> in a fastdebug build it does not.
> The test needs to set the CodeCacheMinBlockLength flag to 1. The flag 
> is currently develop and we would need to make it product for the test 
> to work. (Other flags controlling the code cache, e.g., 
> CodeCacheExpansionSize, are also product.) I could also experiment 
> with reproducing the problem with different block lengths/segment 
> sizes, but that would most likely make the test more fragile (and 
> CodeCacheSegmentSize is anyway develop as well).
> I tested the fix with JPRT, RBT is in progress.
> Thank you!
> Best regards,
> Zoltan
> [1] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/777aaa19c4b1#l116.71

More information about the hotspot-compiler-dev mailing list