Solaris specific bug in GC log rotation - extremely long pause
rainer.jung at kippdata.de
Wed Nov 11 07:38:39 UTC 2015
Am 11.11.2015 um 00:05 schrieb Poonam Bajaj Parhar:
> Hello Rainer,
> Could you please file a bug for this problem here:
I did right now.
> Please provide the problem description there and attach the reproducible
> program, and let us know the created bug id.
The bug ID is: (Review ID: JI-9026481) - One long Safepoint pause
directly after each GC log rotation
I couldn't attach anything, the bug submission forms do not give the
opportunity to attach files. I copied the Java and shell script sources
into one of the text boxes, but I couldn't fill in all of the
reproduction explanations. Please feel free to attach the tar.gz file I
had sent to the list.
Thanks for your help and advice!
> On 11/10/2015 2:15 PM, Rainer Jung wrote:
>> In case anyone is interested, I'll attach some reproduction code.
>> You can run it as a standalone Java program on Solaris. It is pretty
>> small and simple. A README is included.
>> I hope the list accepts attachments.
>> Am 10.11.2015 um 16:45 schrieb Rainer Jung:
>>> There's a bug in the Solaris specific HotSpot code which exhibits itself
>>> as extremely long pauses directly after GC log rotation.
>>> The bug can be seen in 1.7.0 patchlevel 80 and above, 1.8.0 patchlevel
>>> 20 and above and 1.9.0. It was noticed using
>>> During GC log rotation the function os::print_memory_info() in
>>> src/os/solaris/vm/os_solaris.cpp is called, which itself calls
>>> Now check_addr0() reads the file /proc/self/pmap, but does this entry by
>>> entry. Productive Java processes often have 1000 and more map entries
>>> (e.g. each thread stack is one entry, often each jar etc.
>>> Reading one entry takes about 35-40ms in a zone on a T4 system. Reading
>>> e.g. 1000 entries takes 35-40 seconds during which the JVM is stopped.
>>> We experienced even stop times up until 2400 seconds in extreme cases.
>>> Note that this doesn't have to do with GC. It only happens directly
>>> after GC log file rotation.
>>> The bug is reading the file entry by entry instead of using pread() to
>>> read it in one go. Look for instance at the source code of the Solaris
>>> command "pmap" which uses pread() to read the file at once.
>>> It might also be questioned, whether this map file check must happen
>>> during a safepoint or could also be done outside, only synchronizing in
>>> case a hit was found and a warning actually needs to be written to the
>>> log file.
>>> The bug was shadowed by another bug until and including 1.7.0_76 and
>>> 1.8.0_11: the close() that closes the map file after reading the entries
>>> was erroneously put inside the read loop instead of after the read loop.
>>> Thus these versions only read the first entry, then closed the file and
>>> erred out when trying to read the next entry. So reading was quick.
>>> The responsible change was:
>>> Java 8:
>>> Java 7:
>>> Example code for reading using pread() (going back to OpenSolaris):
>>> Look e.g. at function xmapping_iter() and there look for pread().
>>> See also the discussion thread starting at
>>> The analysis was done using source code review and realtime checks with
>>> pstack and truss. It would be nice if you could instruct me how to get
>>> this fixed. It is a bit specific because it only happens on Solaris and
>>> when GC log rotation is active, but Solaris should be an important
>>> platform for Oracle and GC log rotation will be used more and more.
>>> I do *not* have a support contract.
More information about the hotspot-gc-dev