Crash on super large heap size using CMS and it's fix

M豪(Hal) mo_jianhao at
Wed Sep 12 01:00:15 UTC 2012

Hi all,

This is Hal Mo<kungu.mjh at> from Alibaba Group(with OCA).

Our hadoop namenode crashed, when we set the heap size to 135G using CMS GC.
Attached please find the crash log(hs_err_pid.log).

I can steadily reproduce the crash on a test machine with 190G physical memory, by a simple command:
$ java -Xmx135g -XX:+UseConcMarkSweepGC

Then I build a debug jvm and use gdb to debug the problem.

call stack

C  []  memset+0x40
V  []  BlockOffsetArray::set_remainder_to_point_to_start_incl(unsigned long, unsigned long, bool)+0xce
V  []  BlockOffsetArray::set_remainder_to_point_to_start(HeapWord*, HeapWord*, bool)+0x71
V  []  BlockOffsetArray::BlockOffsetArray(BlockOffsetSharedArray*, MemRegion, bool)+0x9f
V  []  BlockOffsetArrayNonContigSpace::BlockOffsetArrayNonContigSpace(BlockOffsetSharedArray*, MemRegion)+0x37
V  []  CompactibleFreeListSpace::CompactibleFreeListSpace(BlockOffsetSharedArray*, MemRegion, bool, FreeBlockDictionary::DictionaryChoice)+0x9b
V  []  ConcurrentMarkSweepGeneration::ConcurrentMarkSweepGeneration(ReservedSpace, unsigned long, int, CardTableRS*, bool, FreeBlockDictionary::DictionaryChoice)+0x1df
V  []  GenerationSpec::init(ReservedSpace, int, GenRemSet*)+0x37c
V  []  GenCollectedHeap::initialize()+0x510
V  []  Universe::initialize_heap()+0x31d
V  []  universe_init()+0xa6
V  []  init_globals()+0x34
V  []  Threads::create_vm(JavaVMInitArgs*, bool*)+0x23a
V  []  JNI_CreateJavaVM+0x7a

in function BlockOffsetArray::set_remainder_to_point_to_start_inc, inside the for loop:

    size_t reach = start_card - 1 + (power_to_cards_back(i+1) - 1);

when i = 7, the value of reach was 0. then the loop could not break, and 

    _array->set_offset_array(start_card_for_region, reach, offset, reducing);

accessed the wrong address, and crashed.

the root cause was 
static size_t power_to_cards_back(uint i) {
    return (size_t)(1 << (LogBase * i));
the literal 1 is a 32bit int, and 1<<32 overflow. 

Here was my fix(has been tested), also found in attached file cms_large_heap_crash.patch

+++ b/src/share/vm/memory/blockOffsetTable.hpp

@@ -289,7 +289,7 @@


static size_t power_to_cards_back(uint i) {

- return (size_t)(1 << (LogBase * i));

+ return (size_t)1 << (LogBase * i);


static size_t power_to_words_back(uint i) {

return power_to_cards_back(i) * N_words;

Contributed-by: Hal Mo <kungu.mjh at> 

Similar situation also found in G1, but the size is mega(2^20) based. 2^(32+20) is too large to overflow.

Krystal remind me, this changeset cover the same code, 
I do not  build it on visual studio, someone please help to review the compatibility with VS.


