cpu 突增问题排查

cpu 突增问题排查

问题背景

临清明节前,收到监控报警,信息如下:

[告警名称:cpu.idle]
[告警指标:cpu.idle]
最近5个点值:[0,2.348,0,0,0]
触发规则:
[过去5个点内,所有点的值 < 10 ]

数据时间:2021-04-01 11:40:00
备注: cpu使用率高
告警时间:2021-04-01 11:40:19
持续时长:just now

查看 cpu 空闲使用率如下:

 

cpu.idle
cpu.idle

 

问题定位

1. 查看占用 CPU 的线程

使用 top -H 查看机器上每个线程对 cpu 的占用情况如下

 

cp占用
cp占用

 

2. 打印 java 进程的线程信息

使用 jstack -p pid > stack.txt 导出 java 进程的线程栈

通过 linux 的10进制的线程号计算出 java 的 16 进制线程号,查找对应的 Java 线程,对应的线程栈如下

"serviceAppender" #34 daemon prio=5 os_prio=0 tid=0x00007efc1a700800 nid=0x350 runnable [0x00007efc60f33000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.logging.log4j.core.lookup.StrSubstitutor.substitute(StrSubstitutor.java:899)
        at org.apache.logging.log4j.core.lookup.StrSubstitutor.substitute(StrSubstitutor.java:872)
        at org.apache.logging.log4j.core.lookup.StrSubstitutor.replace(StrSubstitutor.java:427)
        at org.apache.logging.log4j.core.pattern.MessagePatternConverter.format(MessagePatternConverter.java:125)
        at org.apache.logging.log4j.core.pattern.PatternFormatter.format(PatternFormatter.java:38)


"AsyncAppender" #24 daemon prio=5 os_prio=0 tid=0x00007efc1a665000 nid=0x346 runnable [0x00007efc6143e000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.logging.log4j.core.lookup.StrSubstitutor.substitute(StrSubstitutor.java:1014)
        at org.apache.logging.log4j.core.lookup.StrSubstitutor.substitute(StrSubstitutor.java:872)
        at org.apache.logging.log4j.core.lookup.StrSubstitutor.replace(StrSubstitutor.java:427)
        at org.apache.logging.log4j.core.pattern.MessagePatternConverter.format(MessagePatternConverter.java:125)
        at org.apache.logging.log4j.core.pattern.PatternFormatter.format(PatternFormatter.java:38)
        at org.apache.logging.log4j.scribe.layout.LcLayout.toSerializable(LcLayout.java:111)
        at org.apache.logging.log4j.scribe.layout.LcLayout.toSerializable(LcLayout.java:32)
        at org.apache.logging.log4j.core.layout.AbstractStringLayout.toByteArray(AbstractStringLayout.java:258)
        at org.apache.logging.log4j.scribe.appender.ScribeAppender.append(ScribeAppender.java:160)
        at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:156)
        at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:129)
        at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:120)
        at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84)
        at org.apache.logging.log4j.core.appender.AsyncAppender$AsyncThread.callAppenders(AsyncAppender.java:451)
        at org.apache.logging.log4j.core.appender.AsyncAppender$AsyncThread.run(AsyncAppender.java:404) 

问题解决

从上面可以看出来是写日志导致,查看最近上线的代码,发现一位小同学在 3.30 上线的代码中加了两行 info 级别的日志。而这两行日志每天执行量在千万级别以上。

将两行代码去掉后重新上线,观察 cpu 的使用情况,恢复正常。

 

cpu.idle
cpu.idle

 

原文地址:https://www.cnblogs.com/feshfans/p/14620163.html