记录java程序一次CPU占用90%问题排查过程

线上问题出现有个服务多台机器CPU飙升到90%

1、查看系统CPU占用

top -H -p pid

top - 11:51:57 up 110 days, 14:01, 0 users, load average: 7.08, 7.64, 8.15
Threads: 586 total, 11 running, 575 sleeping, 0 stopped, 0 zombie
%Cpu(s): 82.5 us, 7.0 sy, 0.0 ni, 8.0 id, 1.5 wa, 0.0 hi, 1.0 si, 0.0 st
KiB Mem : 8174716 total, 157652 free, 6103332 used, 1913732 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 1741096 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31810 * 20 0 8485932 4.591g 0 R 53.9 58.9 117:41.64 java
32411 * 20 0 8485932 4.591g 0 R 44.7 58.9 85:29.14 java
31626 * 20 0 8485932 4.591g 0 S 21.7 58.9 0:53.47 java
31716 * 20 0 8485932 4.591g 0 R 19.1 58.9 461:14.88 java
6757 * 20 0 8485932 4.591g 0 R 15.5 58.9 27:29.25 java
7217 * 20 0 8485932 4.591g 0 R 14.8 58.9 27:17.37 java
7216 * 20 0 8485932 4.591g 0 R 12.8 58.9 27:21.51 java
5295 * 20 0 8485932 4.591g 0 R 11.8 58.9 28:12.26 java
31627 * 20 0 8485932 4.591g 0 S 11.5 58.9 15:56.04 java

....

2、查看对应的具体线程占用资源情况,获取16进制线程id

printf “%x
” pid

printf '%x ' 32426
7eaa

3、导出jstack日志

"logback-6" #528 daemon prio=5 os_prio=0 tid=0x00007fda40005000 nid=0x7eab waiting on condition [0x00007fda3c3df000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x0000000765aea438> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
    - None

"logback-5" #527 daemon prio=5 os_prio=0 tid=0x00007fda40004800 nid=0x7eaa waiting on condition [0x00007fda1db8d000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x0000000765aea438> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
    - None

4、使用第2步的id查找jstack日志,logback-5即线程名

"logback-5" #527 daemon prio=5 os_prio=0 tid=0x00007fda40004800 nid=0x7eaa waiting on condition [0x00007fda1db8d000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x0000000765aea438> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
    - None

根据线程名查找代码

5、由于该服务日志量很大,且打印的都是trace日志,而且打印的是异步日志

ch.qos.logback.core.AsyncAppenderBase#put

private void put(E eventObject) {
        if (neverBlock) {
            blockingQueue.offer(eventObject);
        } else {
            putUninterruptibly(eventObject);
        }
    }

综上所述排查顺序,则缩小打印日志量应该就可以减少日志打印的cpu占用情况,而且应该也能减少CMS 老年代垃圾回收时间。准备关闭trace日志,只使用公司的全链路日志,优化效果应该非常明显。

原文地址:https://www.cnblogs.com/beaconSky/p/14024361.html