服务器cpu负载过高问题排查

https://blog.csdn.net/MrZhangXL/article/details/77711996

第一步 :执行top命令,查出当前机器线程情况

top - 09:14:36 up 146 days, 20:24,  1 user,  load average: 0.31, 0.37, 0.45
Tasks: 338 total,   1 running, 337 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.4%us,  0.4%sy,  0.0%ni, 99.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16331456k total, 14017124k used,  2314332k free,   316724k buffers
Swap: 16506876k total,    36200k used, 16470676k free,  5151404k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                              
 6107 root      20   0 9724m 1.3g  15m S  8.0  8.4 624:22.34 java                                                                                                  
26678 root      20   0 9643m 498m  17m S  8.0  3.1  61:34.58 java                                                                                                  
10168 root      20   0 12.9g 4.7g  12m S  1.7 30.0 329:11.58 java                                                                                                  
11291 root      20   0 9654m 1.2g  15m S  0.7  7.7  35:17.04 java                                                                                                  
28662 root      20   0 15168 1436  940 R  0.7  0.0   0:00.12 top                                                                                                   
   61 root      20   0     0    0    0 S  0.3  0.0   4:10.55 ksoftirqd/14                                                                                          
   68 root      20   0     0    0    0 S  0.3  0.0 119:58.11 events/1                                                                                              
 2987 root      20   0     0    0    0 S  0.3  0.0  31:22.27 flush-253:2                                                                                           
    1 root      20   0 19344 1204  980 S  0.0  0.0   0:03.71 init             
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

第二步:通过jstack命令dump当前内存中线程信息 
jstack 26678

[root@172_16_1_177 logs]# jstack 26678
2017-08-30 10:34:29
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode):

"Attach Listener" daemon prio=10 tid=0x00007f96dc001000 nid=0x70d5 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"DubboServerHandler-172.16.1.177:28082-thread-50" daemon prio=10 tid=0x00007f967c001800 nid=0x70a1 waiting on condition [0x00007f9543040000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000ca928680> (a java.util.concurrent.SynchronousQueue$TransferStack)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458)
    at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
    at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:925)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

"DubboServerHandler-172.16.1.177:28082-thread-49" daemon prio=10 tid=0x00007f95f820c000 nid=0x708e waiting on condition [0x00007f9543081000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000ca928680> (a java.util.concurrent.SynchronousQueue$TransferStack)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458)
    at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
    at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:925)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

"DubboServerHandler-172.16.1.177:28082-thread-48" daemon prio=10 tid=0x00007f9674001800 nid=0x7085 waiting on condition [0x00007f95430c2000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000ca928680> (a java.util.concurrent.SynchronousQueue$TransferStack)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458)
    at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
    at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:925)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

"DubboServerHandler-172.16.1.177:28082-thread-47" daemon prio=10 tid=0x00007f95f820a000 nid=0x7084 waiting on condition [0x00007f9543103000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000ca928680> (a java.util.concurrent.SynchronousQueue$TransferStack)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458)
    at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
    at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:925)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

"DubboServerHandler-172.16.1.177:28082-thread-46" daemon prio=10 tid=0x00007f95f8207800 nid=0x7082 waiting on condition [0x00007f9543144000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000ca928680> (a java.util.concurrent.SynchronousQueue$TransferStack)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458)
    at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
    at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:925)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72

4.1、使用方案 
cpu飙高,load高,响应很慢 
方案: 
* 一个请求过程中多次dump 
* 对比多次dump文件的runnable线程,如果执行的方法有比较大变化,说明比较正常。如果在执行同一个方法,就有一些问题了。 
查找占用cpu最多的线程信息 
方案: 
* 使用命令: top -H -p pid(pid为被测系统的进程号),找到导致cpu高的线程id。 
上述Top命令找到的线程id,对应着dump thread信息中线程的nid,只不过一个是十进制,一个是十六进制。 
* 在thread dump中,根据top命令查找的线程id,查找对应的线程堆栈信息。 
cpu使用率不高但是响应很慢 
方案: 
* 进行dump,查看是否有很多thread struck在了i/o、数据库等地方,定位瓶颈原因。 
请求无法响应 
方案: 
* 多次dump,对比是否所有的runnable线程都一直在执行相同的方法,如果是的,恭喜你,锁住了!

参考资料:http://blog.csdn.net/rachel_luo/article/details/8920596

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/MrZhangXL/article/details/77711996
原文地址:https://www.cnblogs.com/linus-tan/p/9334138.html