Identifying concurrent mode failures caused by fragmentation

It might be helpful to set -XX:PrintFLSStatistics to a value greater than
zero, to get the stats of FreeListSpace so that you'd know the size of the
biggest fragment. The GC log produced by -XX:+PrintGCDetails doesn't give
enough information on fragmentation.

Here's an example of using -XX:PrintFLSStatistics=1:
It does make the GC log messier, and some of the GC log parsing tools won't
cope with this, but you get to know how bad the fragmentation is.

Anyway, it looks like you're using CMS in incremental mode. This mode
should be obsolete in JDK6 already. Is there a good reason for you to be
using it? If not, I'd suggest turning it off, though, no matter if you're
upgrading your JDK or not.

> We have an application running with a 6GB heap (complete parameters
> below).  Mostly it has a fairly low turnover of memory use, but on
> occasion, it will come under some pressure as it reloads a large in-memory
> data set from a database.
> Sometimes in this situation, we'll see a concurrent mode failure. Here's
> one failure:
> 20021.464: [GC 20021.465: [ParNew: 13093K->3939K(76672K), 0.0569240
> secs]20021.522: [CMS20023.747: [CMS-concurrent-mark: 11.403/29.029 secs]
> [Times: user=41.11 sys=1.03, real=29.03 secs]
>  (concurrent mode failure): 3873922K->2801744K(6206272K), 30.7900180 secs]
> 3886215K->2801744K(6282944K), [CMS Perm : 142884K->142834K(524288K)]
> icms_dc=33 , 30.8473830 secs] [Times: user=30.26 sys=0.71, real=30.85 secs]
> Total time for which application threads were stopped: 30.8484460 seconds
> (I've attached a lengthier log including the previous and subsequent CMS
> collection.)
> Am I correct in thinking that this failure can basically only be caused by
> fragmentation?  Both young and old seem to have plenty of space. There
> doesn't seem to be any sign that the tenured generation would run out of
> space before CMS completes.  Fragmentation is the only remaining cause that
> occurs to me.
> We're running with 1.6.0_11, although this will be upgraded to 1.6.0_26
> tomorrow.  I realise our current version is ancient - I'm not really
> looking for help on the problem itself, just for advice on whether the log
> line above indicates fragmentation.
> The parameters we have set are:
> -server
> -Xmx6144M
> -Xms6144M
> -XX:MaxPermSize=512m
> -XX:PermSize=512m
> -XX:+UseConcMarkSweepGC
> -XX:+CMSIncrementalMode
> -XX:+CMSIncrementalPacing
> -XX:SoftRefLRUPolicyMSPerMB=3
> -XX:**CMSIncrementalSafetyFactor=30
> -XX:+PrintGCDetails
> -XX:+**PrintGCApplicationStoppedTime
> -XX:+**PrintGCApplicationConcurrentTi**me
> -XX:+PrintGCTimeStamps
> -Xloggc:/home/tbmx/log/gc_`**date +%Y%m%d%H%M`.log
