记录一次JVM调优【GC日志的分析】

首先查看服务器版本默认信息：

修改tomcat/bin/catalina.sh，在最顶端加入JAVA_OPTS="$JAVA_OPTS -XX:+PrintGCDetails -Xloggc:/usr/local/java/apache-tomcat-7.0.78/bin/gc.log"

随后在 -Xloggc指定路径下，打开gc.log，部分信息如下：

[GC (Allocation Failure) [PSYoungGen: 15872K->2544K(18432K)] 15872K->4199K(60928K), 0.0076594 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
0.801: [GC (Allocation Failure) [PSYoungGen: 18416K->2554K(34304K)] 20071K->7458K(76800K), 0.0084303 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
1.102: [GC (Allocation Failure) [PSYoungGen: 34298K->2532K(34304K)] 39202K->10385K(76800K), 0.0073978 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
1.324: [GC (Allocation Failure) [PSYoungGen: 34276K->2540K(66048K)] 42129K->14849K(108544K), 0.0077529 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
1.538: [GC (Allocation Failure) [PSYoungGen: 66028K->2542K(66048K)] 78337K->25278K(108544K), 0.0107662 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
1.710: [GC (Allocation Failure) [PSYoungGen: 66030K->11795K(137216K)] 88766K->34540K(179712K), 0.0176510 secs] [Times: user=0.03 sys=0.01, real=0.02 secs]
2.065: [GC (Allocation Failure) [PSYoungGen: 136723K->14318K(139264K)] 159468K->48794K(181760K), 0.0246635 secs] [Times: user=0.04 sys=0.00, real=0.03 secs]
2.090: [Full GC (Ergonomics) [PSYoungGen: 14318K->2377K(139264K)] [ParOldGen: 34476K->42413K(78848K)] 48794K->44791K(218112K), [Metaspace: 13713K->13713K(1062912K)], 0.0826740 secs] [Times: user=0.15 sys=0.01, real=0.08 secs]
2.488: [GC (Allocation Failure) [PSYoungGen: 127305K->22522K(222720K)] 169719K->65057K(301568K), 0.0318258 secs] [Times: user=0.06 sys=0.00, real=0.03 secs]
3.086: [GC (Allocation Failure) [PSYoungGen: 222714K->26617K(268288K)] 265249K->87196K(347136K), 0.0409823 secs] [Times: user=0.07 sys=0.02, real=0.05 secs]
3.731: [GC (Allocation Failure) [PSYoungGen: 268281K->37864K(278016K)] 328860K->112315K(356864K), 0.0531917 secs] [Times: user=0.09 sys=0.02, real=0.05 secs]
3.785: [Full GC (Ergonomics) [PSYoungGen: 37864K->0K(278016K)] [ParOldGen: 74450K->59866K(113152K)] 112315K->59866K(391168K), [Metaspace: 13760K->13760K(1062912K)], 0.1234969 secs] [Times: user=0.24 sys=0.00, real=0.13 secs]
4.434: [GC (Allocation Failure) [PSYoungGen: 240128K->45645K(286208K)] 299994K->105511K(399360K), 0.0358239 secs] [Times: user=0.07 sys=0.00, real=0.04 secs]
4.989: [GC (Allocation Failure) [PSYoungGen: 285773K->42527K(269824K)] 345639K->109322K(382976K), 0.0452064 secs] [Times: user=0.09 sys=0.01, real=0.04 secs]
5.489: [GC (Allocation Failure) [PSYoungGen: 260127K->37422K(255488K)] 326922K->110639K(368640K), 0.0331642 secs] [Times: user=0.06 sys=0.00, real=0.03 secs]
7.060: [GC (Allocation Failure) [PSYoungGen: 255022K->11882K(278016K)] 328239K->89219K(391168K), 0.0199962 secs] [Times: user=0.03 sys=0.00, real=0.02 secs]

7.328: [GC (Metadata GC Threshold) [PSYoungGen: 65740K->160K(276480K)] 143076K->81525K(389632K), 0.0071875 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
7.336: [Full GC (Metadata GC Threshold) [PSYoungGen: 160K->0K(276480K)] [ParOldGen: 81365K->30917K(90624K)] 81525K->30917K(367104K), [Metaspace: 23265K->23265K(1071104K)], 0.0701499 secs] [Times: user=0.12 sys=0.00, real=0.07 secs]
9.690: [GC (Allocation Failure) [PSYoungGen: 216064K->15373K(272896K)] 246981K->46298K(363520K), 0.0266823 secs] [Times: user=0.03 sys=0.00, real=0.02 secs]

9.819: [GC (Metadata GC Threshold) [PSYoungGen: 25761K->8213K(274944K)] 56686K->39146K(365568K), 0.0153547 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
9.835: [Full GC (Metadata GC Threshold) [PSYoungGen: 8213K->0K(274944K)] [ParOldGen: 30933K->37095K(108032K)] 39146K->37095K(382976K), [Metaspace: 38475K->38475K(1083392K)], 0.0922216 secs] [Times: user=0.16 sys=0.00, real=0.09 secs]

77.664: [GC (Allocation Failure) [PSYoungGen: 217088K->56304K(242176K)] 254183K->119858K(350208K), 0.0250691 secs] [Times: user=0.04 sys=0.01, real=0.03 secs]
80.575: [GC (Allocation Failure) [PSYoungGen: 242160K->3375K(262144K)] 305714K->122388K(387072K), 0.0230121 secs] [Times: user=0.02 sys=0.02, real=0.03 secs]
80.599: [Full GC (Ergonomics) [PSYoungGen: 3375K->0K(262144K)] [ParOldGen: 119013K->108779K(213504K)] 122388K->108779K(475648K), [Metaspace: 54741K->54741K(1099776K)], 0.1379975 secs] [Times: user=0.24 sys=0.00, real=0.13 secs]

这里

-Xmx1038m：等价于-XX:MaxHeapSize，设置JVM最大堆内存为1038M。

-Xms64m：等价于-XX:InitialHeapSize，设置JVM初始堆内存为64M。

使用ParallelGC收集器

总共5次full gc，其中两次是元空间（jdk8默认值21.8M左右）引发的full gc，三次是HotSpot自动选择和调优引发的FullGC，自适应策略

改善：-XX:MetaspaceSize=64m 初始元空间大小

增加 JAVA_OPTS="$JAVA_OPTS -server -showversion -Xms512m -Xmx512m -XX:MetaspaceSize=64m"

如果-Xmx分配过大，会导致无法为其他tomcat分配内存：There is insufficient memory for the Java Runtime Environment to continue.这里尝试之后，选择512m

可以看见

Java HotSpot(TM) 64-Bit Server VM (25.144-b01) for linux-amd64 JRE (1.8.0_144-b01), built on Jul 21 2017 21:57:33 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 4056480k(1077372k free), swap 0k(0k free)
CommandLine flags: -XX:InitialHeapSize=2147483648 -XX:MaxHeapSize=2147483648 -XX:MetaspaceSize=67108864 -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC
2.563: [GC (Allocation Failure) [PSYoungGen: 524800K->85308K(611840K)] 524800K->85388K(2010112K), 0.1003043 secs] [Times: user=0.17 sys=0.02, real=0.10 secs]
3.991: [GC (Allocation Failure) [PSYoungGen: 610108K->87037K(611840K)] 610188K->103584K(2010112K), 0.1100503 secs] [Times: user=0.17 sys=0.04, real=0.11 secs]
5.245: [GC (Allocation Failure) [PSYoungGen: 611837K->87018K(611840K)] 628384K->140886K(2010112K), 0.1087186 secs] [Times: user=0.12 sys=0.03, real=0.11 secs]
8.794: [GC (Allocation Failure) [PSYoungGen: 611818K->62918K(611840K)] 665686K->116794K(2010112K), 0.0452909 secs] [Times: user=0.08 sys=0.00, real=0.04 secs]
38.881: [GC (Allocation Failure) [PSYoungGen: 587718K->87036K(611840K)] 641594K->186752K(2010112K), 0.1298147 secs] [Times: user=0.09 sys=0.12, real=0.13 secs]

在38ms之前是减少了4次full gc

重点观察，新生代->老年代的增量，避免promotion failed. 新生代的差量（预计回收，因为先minor gc，再old判断）-heap的差量（实际回收）=老年代的增量

eg：5.245: [GC (Allocation Failure) [PSYoungGen: 611837K->87018K(611840K)] 628384K->140886K(2010112K), 0.1087186 secs] [Times: user=0.12 sys=0.03, real=0.11 secs]

old= 611837-87018K-（628384-140886）=37321K，老年代增长了37M左右

继续改进，替换ParallelGC，改为CMS+ParNew的组合：//当前物理内存4G，剩余内存1G

JAVA_OPTS="$JAVA_OPTS -XX:+PrintGCDetails -Xloggc:/usr/local/java/apache-tomcat-robot/apache-tomcat-web/bin/gc.log -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/usr/local/java/apache-tomcat-robot/apache-tomcat-web/bin/java.hprof -XX:+PrintHeapAtGC"
JAVA_OPTS="$JAVA_OPTS -XX:+ExplicitGCInvokesConcurrent"
JAVA_OPTS="$JAVA_OPTS -server -showversion -Xms2g -Xmx2g -Xmn256m
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:MaxGCPauseMillis=200
-XX:+UseCompressedOops
-XX:+UseCMSCompactAtFullCollection
-XX:+CMSClassUnloadingEnabled
-XX:MetaspaceSize=128m
-XX:MaxMetaspaceSize=512m
-XX:MaxTenuringThreshold=7
-XX:TargetSurvivorRatio=90
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+ExplicitGCInvokesConcurrent
-XX:LargePageSizeInBytes=128m
-XX:+CMSParallelRemarkEnabled
-XX:CMSInitiatingOccupancyFraction=70"

TIP：

-XX:TargetSurvivorRatio=90：允许90%的Survivor区被占用（JVM默认为50%）。提高对于Survivor区的使用率

-XX:+ExplicitGCInvokesConcurrent //system.gc使用cms算法

-XX:+DisableExplicitGC//禁止system.gc

-XX:HeapDumpPath=/home/admin/logs/java.hprof //OOM快照文件的地址

-XX:SurvivorRatio=10
Eden与Survivor的占用比例,例如10表示，一个survivor区占用 1/10 的Eden内存，即1/12的新生代内存，
新生代有2个survivor，即S0和S1。所以survivor总共是占用新生代内存的 2/12，Eden与新生代的占比则为 10/12

-XX:MaxTenuringThreshold=6

-XX:+UseCMSCompactAtFullCollection //CMS开启碎片整理，CMS默认不会整理堆碎片，因此为了防止堆碎片引起full gc，可以开启该选项
-XX:+UseCompressedOops //启用指针压缩，在64位HotSpot中使用32位指针，默认64位会比32位的内存使用多出1.5倍

-XX:+PrintHeapAtGC //每一次GC前和GC后，都打印堆信息。

-XX:+CMSClassUnloadingEnabled //对方法区gc

-XX:PretenureSizeThreshold=65535 //eden区的大对象阈值65M

产生的正常日志片断：

。。。
{Heap before GC invocations=22 (full 0):
 par new generation   total 235968K, used 235968K [0x0000000080000000, 0x0000000090000000, 0x0000000090000000)
  eden space 209792K, 100% used [0x0000000080000000, 0x000000008cce0000, 0x000000008cce0000)
  from space 26176K, 100% used [0x000000008cce0000, 0x000000008e670000, 0x000000008e670000)
  to   space 26176K,   0% used [0x000000008e670000, 0x000000008e670000, 0x0000000090000000)
 concurrent mark-sweep generation total 1835008K, used 1123030K [0x0000000090000000, 0x0000000100000000, 0x0000000100000000)
 Metaspace       used 64924K, capacity 66906K, committed 67072K, reserved 1107968K
  class space    used 7769K, capacity 8333K, committed 8448K, reserved 1048576K
2223.576: [GC (Allocation Failure) 2223.576: [ParNew: 235968K->26176K(235968K), 0.0583532 secs] 1358998K->1225544K(2070976K), 0.0585896 secs] [Times: user=0.08 sys=0.02, real=0.06 secs] 
Heap after GC invocations=23 (full 0):
 par new generation   total 235968K, used 26176K [0x0000000080000000, 0x0000000090000000, 0x0000000090000000)
  eden space 209792K,   0% used [0x0000000080000000, 0x0000000080000000, 0x000000008cce0000)
  from space 26176K, 100% used [0x000000008e670000, 0x0000000090000000, 0x0000000090000000)
  to   space 26176K,   0% used [0x000000008cce0000, 0x000000008cce0000, 0x000000008e670000)
 concurrent mark-sweep generation total 1835008K, used 1199368K [0x0000000090000000, 0x0000000100000000, 0x0000000100000000)
 Metaspace       used 64924K, capacity 66906K, committed 67072K, reserved 1107968K
  class space    used 7769K, capacity 8333K, committed 8448K, reserved 1048576K
}
。。。

打开GC前后输出堆信息：

黄色（GC前/后）：

before：老年代总大小1835008k，已使用1123030k

after：老年代总大小1835008k，已使用1199368k

橘色（新生代GC变化）：235968-26176-（1358998-1225544）=黄色差值（1199368-1123030）

当启动服务器后（时隔约1小时），第一次产生full gc：

{Heap before GC invocations=24 (full 0):
 par new generation   total 235968K, used 235967K [0x0000000080000000, 0x0000000090000000, 0x0000000090000000)
  eden space 209792K, 100% used [0x0000000080000000, 0x000000008cce0000, 0x000000008cce0000)
  from space 26176K,  99% used [0x000000008cce0000, 0x000000008e66fff8, 0x000000008e670000)
  to   space 26176K,   0% used [0x000000008e670000, 0x000000008e670000, 0x0000000090000000)
 concurrent mark-sweep generation total 1835008K, used 1275665K [0x0000000090000000, 0x0000000100000000, 0x0000000100000000)
 Metaspace       used 65433K, capacity 67440K, committed 67584K, reserved 1107968K
  class space    used 7770K, capacity 8336K, committed 8448K, reserved 1048576K
2699.930: [GC (Allocation Failure) 2699.930: [ParNew: 235967K->26176K(235968K), 0.0610195 secs] 1511633K->1386131K(2070976K), 0.0612115 secs] [Times: user=0.09 sys=0.02, real=0.06 secs] 
Heap after GC invocations=25 (full 0):
 par new generation   total 235968K, used 26176K [0x0000000080000000, 0x0000000090000000, 0x0000000090000000)
  eden space 209792K,   0% used [0x0000000080000000, 0x0000000080000000, 0x000000008cce0000)
  from space 26176K, 100% used [0x000000008e670000, 0x0000000090000000, 0x0000000090000000)
  to   space 26176K,   0% used [0x000000008cce0000, 0x000000008cce0000, 0x000000008e670000)
 concurrent mark-sweep generation total 1835008K, used 1359955K [0x0000000090000000, 0x0000000100000000, 0x0000000100000000)
 Metaspace       used 65433K, capacity 67440K, committed 67584K, reserved 1107968K
  class space    used 7770K, capacity 8336K, committed 8448K, reserved 1048576K
}
2699.992: [GC (CMS Initial Mark) [1 CMS-initial-mark: 1359955K(1835008K)] 1386239K(2070976K), 0.0043239 secs] [Times: user=0.01 sys=0.02, real=0.00 secs] 
2699.997: [CMS-concurrent-mark-start]
2700.036: [CMS-concurrent-mark: 0.039/0.039 secs] [Times: user=0.08 sys=0.00, real=0.04 secs] 
2700.036: [CMS-concurrent-preclean-start]
2700.040: [CMS-concurrent-preclean: 0.004/0.004 secs] [Times: user=0.00 sys=0.00, real=0.01 secs] 
2700.040: [GC (CMS Final Remark) [YG occupancy: 26284 K (235968 K)]2700.040: [Rescan (parallel) , 0.0018878 secs]2700.042: [weak refs processing, 0.0027371 secs]2700.045: [class unloading, 0.0212439 secs]2700.066: [scrub symbol table, 0.0075598 secs]2700.074: [scrub string table, 0.0012169 secs][1 CMS-remark: 1359955K(1835008K)] 1386239K(2070976K), 0.0456603 secs] [Times: user=0.06 sys=0.00, real=0.04 secs] 
2700.086: [CMS-concurrent-sweep-start]
2700.170: [CMS-concurrent-sweep: 0.082/0.083 secs] [Times: user=0.09 sys=0.00, real=0.09 secs] 
2700.170: [CMS-concurrent-reset-start]
2700.175: [CMS-concurrent-reset: 0.005/0.005 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
{Heap before GC invocations=25 (full 1):
 par new generation   total 235968K, used 235968K [0x0000000080000000, 0x0000000090000000, 0x0000000090000000)
  eden space 209792K, 100% used [0x0000000080000000, 0x000000008cce0000, 0x000000008cce0000)
  from space 26176K, 100% used [0x000000008e670000, 0x0000000090000000, 0x0000000090000000)
  to   space 26176K,   0% used [0x000000008cce0000, 0x000000008cce0000, 0x000000008e670000)
 concurrent mark-sweep generation total 1835008K, used 431812K [0x0000000090000000, 0x0000000100000000, 0x0000000100000000)
 Metaspace       used 65340K, capacity 67220K, committed 67584K, reserved 1107968K
  class space    used 7748K, capacity 8294K, committed 8448K, reserved 1048576K
2938.667: [GC (Allocation Failure) 2938.667: [ParNew: 235968K->26176K(235968K), 0.0448878 secs] 667780K->541410K(2070976K), 0.0450784 secs] [Times: user=0.09 sys=0.00, real=0.05 secs] 
Heap after GC invocations=26 (full 1):
 par new generation   total 235968K, used 26176K [0x0000000080000000, 0x0000000090000000, 0x0000000090000000)
  eden space 209792K,   0% used [0x0000000080000000, 0x0000000080000000, 0x000000008cce0000)
  from space 26176K, 100% used [0x000000008cce0000, 0x000000008e670000, 0x000000008e670000)
  to   space 26176K,   0% used [0x000000008e670000, 0x000000008e670000, 0x0000000090000000)
 concurrent mark-sweep generation total 1835008K, used 515234K [0x0000000090000000, 0x0000000100000000, 0x0000000100000000)
 Metaspace       used 65340K, capacity 67220K, committed 67584K, reserved 1107968K
  class space    used 7748K, capacity 8294K, committed 8448K, reserved 1048576K
}

可以看到当超过老年代阈值的时候，会触发cms，清理掉约1g的垃圾

GC之日分析其他实例，参考 http://hot66hot.iteye.com/blog/2075819

新生代(ParNew YGC)promotion failed日志

2014-02-27T21:19:42.460+0800: 210095.040: [GC 210095.040: [ParNew (promotion failed): 1887487K->1887488K(1887488K), 0.4818790 secs]210095.522: [CMS: 13706434K->7942818K(23068672K), 9.7152990 secs] 15358303K->7942818K(24956160K), [CMS Perm : 27424K->27373K(98304K)], 10.1974110 secs] [Times: user=12.06 sys=0.01, real=10.20 secs]

promotion failed一般发生在新生代晋升老年代时,引发Full Gc. //CMS: 13706434K->7942818K(23068672K)指的是老年代的前后变化

解决思路:YGC晋升对象过大,过老.(mid/long Time Object),

调整-XX:PretenureSizeThreshold=65535,//虚拟机提供了一个-XX:PretenureSizeThreshold参数，令大于这个设置值的对象直接在老年代分配。这样做的目的是避免在Eden区及两个Survivor区之间发生大量的内存复制，注意:PretenureSizeThreshold参数只对Serial和ParNew两款收集器有效

　-XX:MaxTenuringThreshold=6，缩短年龄，使新生代空间得以保障

老年代(CMS old gc)分析：

2014-02-28T23:58:42.314+0800: 25789.661: [GC [1 CMS-initial-mark: 17303356K(23068672K)] 18642315K(24903680K), 1.0400410 secs] [Times: user=1.04 sys=0.00, real=1.04 secs]
2014-02-28T23:58:43.354+0800: 25790.701: [CMS-concurrent-mark-start]
2014-02-28T23:58:43.717+0800: 25791.064: [CMS-concurrent-mark: 0.315/0.363 secs] [Times: user=1.64 sys=0.02, real=0.37 secs]
2014-02-28T23:58:43.717+0800: 25791.064: [CMS-concurrent-preclean-start]
2014-02-28T23:58:43.907+0800: 25791.254: [CMS-concurrent-preclean: 0.181/0.190 secs] [Times: user=0.20 sys=0.01, real=0.19 secs]
2014-02-28T23:58:43.907+0800: 25791.254: [CMS-concurrent-abortable-preclean-start]
CMS: abort preclean due to time 2014-02-28T23:58:49.082+0800: 25796.429: [CMS-concurrent-abortable-preclean: 5.165/5.174 secs] [Times: user=5.40 sys=0.04, real=5.17 secs]
2014-02-28T23:58:49.083+0800: 25796.430: [GC[YG occupancy: 1365142 K (1835008 K)]2014-02-28T23:58:49.083+0800: 25796.430: [Rescan (parallel) , 0.9690640 secs]2014-02-28T23:58:50.052+0800: 25797.399: [weak refs processing, 0.0006190 secs]2014-02-28T23:58:50.053+0800: 25797.400: [scrub string table, 0.0006290 secs] [1 CMS-remark: 17355150K(23068672K)] 18720292K(24903680K), 0.9706650 secs] [Times: user=16.49 sys=0.06, real=0.97 secs]
2014-02-28T23:58:50.054+0800: 25797.401: [CMS-concurrent-sweep-start]
2014-02-28T23:58:51.940+0800: 25799.287: [CMS-concurrent-sweep: 1.875/1.887 secs] [Times: user=2.03 sys=0.03, real=1.89 secs]
2014-02-28T23:58:51.941+0800: 25799.288: [CMS-concurrent-reset-start]
2014-02-28T23:58:52.067+0800: 25799.414: [CMS-concurrent-reset: 0.127/0.127 secs] [Times: user=0.13 sys=0.00, real=0.13 secs]
2014-03-01T00:00:36.293+0800: 25903.640: [GC2014-03-01T00:00:36.293+0800: 25903.640: [ParNew: 1805234K->226801K(1835008K), 0.1020510 secs] 10902912K->9434796K(24903680K), 0.1023150 secs] [Times: user=1.35 sys=0.02, real=0.10 secs]
2014-03-01T00:07:13.559+0800: 26300.906: [GC2014-03-01T00:07:13.559+0800: 26300.906: [ParNew: 1799665K->248991K(1835008K), 0.0876870 secs] 14086673K->12612462K(24903680K), 0.0879620 secs] [Times: user=1.24 sys=0.01, real=0.09 secs]

CMS的gc日志分为一下几个步骤，重点关注initial-mark和remark这两个阶段，因为这两个阶段会stop进程。

初始标记（init mark）：收集根引用，这是一个stop-the-world阶段。

并发标记（concurrent mark）：这个阶段可以和用户应用并发进行。遍历老年代的对象图，标记出活着的对象。

并发预清理（concurrent preclean）：这同样是一个并发的阶段。主要的用途也是用来标记，用来标记那些在前面标记之后，发生变化的引用。主要是为了缩短remark阶段的stop-the-world的时间。

重新标记（remark）：这是一个stop-the-world的操作。暂停各个应用，统计那些在发生变化的标记。

并发清理（concurrent sweep）：并发扫描整个老年代，回收一些在对象图中不可达对象所占用的空间。

并发重置（concurrent reset）：重置某些数据结果，以备下一个回收周期

老年代(CMS old GC ) concurrent mode failure日志

2014-03-03T09:38:26.457+0800: 233373.804: [GC [1 CMS-initial-mark: 17319615K(23068672K)] 17351070K(24903680K), 0.0419440 secs]
[Times: user=0.04 sys=0.00, real=0.04 secs]
2014-03-03T09:38:26.499+0800: 233373.846: [CMS-concurrent-mark-start]
2014-03-03T09:38:28.175+0800: 233375.522: [GC2014-03-03T09:38:28.175+0800: 233375.522: [CMS2014-03-03T09:38:28.887+0800: 233376.234:
[CMS-concurrent-mark: 1.989/2.388 secs] [Times: user=14.37 sys=0.24, real=2.39 secs]
(concurrent mode failure): 17473174K->8394653K(23068672K), 19.3309170 secs] 18319691K->8394653K(24903680K),
[CMS Perm : 23157K->23154K(98304K)], 19.3311700 secs] [Times: user=22.18 sys=0.00, real=19.33 secs]

concurrent mode failure一般发生在CMS GC 运行过程中（有新的大对象或者满足old阈值的对象晋升老年代）,老年代空间不足，引发MSC(Full GC)

上面的这条发日志说明CMS运行到CMS-concurrent-mark过程中就出现空间不足,产生并发失败(17319615K(23068672K)占77%),

解决思路：降低YGC频率，降低CMS GC触发时机,适当降低CMSInitiatingOccupancyFraction（即分配给老年代更大的预留空间）.

由新生代触发的老年代(CMS old GC ) concurrent mode failure日志

197.976: [GC 197.976: [ParNew: 260872K->260872K(261952K), 0.0000688 secs]

197.976: [CMS197.981: [CMS-concurrent-sweep: 0.516/0.531 secs] (concurrent mode failure): 402978K->248977K(786432K), 2.3728734 secs] 663850K->248977K(1048384K), 2.3733725 secs]

这个例子表明：GC算法进入了concurrent mode failure状态，由于老年代的内存碎片太多导致的老年代空间不足，需要调用一个serail Old GC(阻塞了其他线程)来清理系统的Heap

concurrent mode failure和promotion failed差别：

promotion failed是在进行Minor GC时，survivor space放不下、对象只能放入旧生代，而此时旧生代也放不下造成的；

concurrent mode failure是在执行CMS GC的过程中同时有对象要放入旧生代，而此时旧生代空间不足造成的。

当出现concurrent mode failure和promotion failed，会接着引发full gc的。

所以在别一次性占用太多的内存，如果是读文件，可以采用拆分的方法（批次读入缓存区）。或者把GC内存调大点