Solaris specific bug in GC log rotation - extremely long pause

Rainer Jung rainer.jung at kippdata.de
Tue Nov 10 15:45:49 UTC 2015


There's a bug in the Solaris specific HotSpot code which exhibits itself 
as extremely long pauses directly after GC log rotation.

The bug can be seen in 1.7.0 patchlevel 80 and above, 1.8.0 patchlevel 
20 and above and 1.9.0. It was noticed using 
-XX:PrintGCApplicationStoppedTime.

During GC log rotation the function os::print_memory_info() in 
src/os/solaris/vm/os_solaris.cpp is called, which itself calls 
check_addr0().

Now check_addr0() reads the file /proc/self/pmap, but does this entry by 
entry. Productive Java processes often have 1000 and more map entries 
(e.g. each thread stack is one entry, often each jar etc.

Reading one entry takes about 35-40ms in a zone on a T4 system. Reading 
e.g. 1000 entries takes 35-40 seconds during which the JVM is stopped. 
We experienced even stop times up until 2400 seconds in extreme cases.

Note that this doesn't have to do with GC. It only happens directly 
after GC log file rotation.

The bug is reading the file entry by entry instead of using pread() to 
read it in one go. Look for instance at the source code of the Solaris 
command "pmap" which uses pread() to read the file at once.

It might also be questioned, whether this map file check must happen 
during a safepoint or could also be done outside, only synchronizing in 
case a hit was found and a warning actually needs to be written to the 
log file.

The bug was shadowed by another bug until and including 1.7.0_76 and 
1.8.0_11: the close() that closes the map file after reading the entries 
was erroneously put inside the read loop instead of after the read loop. 
Thus these versions only read the first entry, then closed the file and 
erred out when trying to read the next entry. So reading was quick.

The responsible change was:

Java 8: 
http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/diff/7d28f4e15b61/src/os/solaris/vm/os_solaris.cpp

Java 7: 
http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/diff/e50eb3195734/src/os/solaris/vm/os_solaris.cpp

Example code for reading using pread() (going back to OpenSolaris):

https://github.com/joyent/illumos-joyent/blob/master/usr/src/cmd/ptools/pmap/pmap.c

Look e.g. at function xmapping_iter() and there look for pread().

See also the discussion thread starting at 
http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2015-November/002351.html.

The analysis was done using source code review and realtime checks with 
pstack and truss. It would be nice if you could instruct me how to get 
this fixed. It is a bit specific because it only happens on Solaris and 
when GC log rotation is active, but Solaris should be an important 
platform for Oracle and GC log rotation will be used more and more.

I do *not* have a support contract.

Regards,

Rainer


More information about the hotspot-gc-dev mailing list