CPU 负荷过重时邮件报警

首先介绍一下 top 命令的使用方法, top 程序提供了运行系统的动态实时视图, 它可以显示系统摘要信息以及当前线程或进程的列表

$ top -h
  procps-ng 3.3.12
Usage:
  top -hv | -bcHiOSs -d secs -n max -u|U user -p pid(s) -o field -w [cols]

-hv Help/Version 两者都是打印版本等帮助信息

在命令行参数中提供以下选项可以改变默认值

-b    Batch-mode 非窗口模式的输出
-c    Command-line/Program-name 显示进程的 command
-H    Threads-mode 线程模式 指示top显示单个线程。如果没有此命令行选项,则显示每个进程中所有线程的总和。窗口模式下可以用“H”更改
-i    Idle-process 空闲任务 当此切换为“关闭”时,自上次更新以来未使用任何CPU的任务将不会显示
-O    Output-field-names
-S    Cumulative-time 累积模式
-s    Secure-mode 安全模式

-d    Delay-time 延迟时间
-n    刷新次数
-w    限制列数

默认值如下:

Global-defaults
              A - Alt display      Off (full-screen)
            * d - Delay time       1.5 seconds
            * H - Threads mode     Off (summarize as tasks)
              I - Irix mode        On  (no, `solaris' smp)
            * p - PID monitoring   Off (show all processes)
            * s - Secure mode      Off (unsecured)
              B - Bold enable      On  (yes, bold globally)
           Summary-Area-defaults
              l - Load Avg/Uptime  On  (thus program name)
              t - Task/Cpu states  On  (1+1 lines, see `1')
              m - Mem/Swap usage   On  (2 lines worth)
              1 - Single Cpu       Off (thus multiple cpus)
           Task-Area-defaults
              b - Bold hilite      Off (use `reverse')
            * c - Command line     Off (name, not cmdline)
            * i - Idle tasks       On  (show all tasks)
              J - Num align right  On  (not left justify)
              j - Str align right  Off (not right justify)
              R - Reverse sort     On  (pids high-to-low)
            * S - Cumulative time  Off (no, dead children)
            * u - User filter      Off (show euid only)
            * U - User filter      Off (show any uid)
              V - Forest view      On  (show as branches)
              x - Column hilite    Off (no, sort field)
              y - Row hilite       On  (yes, running tasks)
              z - color/mono       On  (show colors)

要想监控 CPU 使用情况, 我们可以观察 top -bi -n 1
以下是命令watch top -bi -n 1的输出

Every 2.0s: top -bi -n 1                                                                                                                                   MyServer: Fri Oct 18 08:45:14 2019

top - 08:45:14 up 36 days,  1:50,  5 users,  load average: 0.07, 0.05, 0.01
Tasks: 146 total,   1 running, 144 sleeping,   1 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.0 sy,  0.0 ni, 99.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  2062096 total,   350188 free,   316304 used,  1395604 buff/cache
KiB Swap:   524284 total,   523764 free,      520 used.  1550992 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND

当我开启一个线程空转时

Every 2.0s: top -bi -n 1                                                                                                                                   MyServer: Fri Oct 18 08:45:55 2019

top - 08:45:55 up 36 days,  1:51,  5 users,  load average: 0.12, 0.06, 0.01
Tasks: 148 total,   1 running, 146 sleeping,   1 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.0 sy,  0.0 ni, 99.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  2062096 total,   339368 free,   327092 used,  1395636 buff/cache
KiB Swap:   524284 total,   523764 free,      520 used.  1540204 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 5595 d         20   0 3100664  33628  24520 S 100.0  1.6   0:04.71 java

当然, top -cbi -n 1 可以显示完整命令行

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 4584 root      20   0 3142032 158940  27872 S   0.3  7.7   1:00.08 java -cp .:bin:SpringDependent/emcat/ref/tomcat-annotations-api-9.0.26.jar:SpringDependent/emcat/ref/tomcat-embed-core-+

使用正则表达式匹配 CPU 和 内存

^.*s+(d+.d+)s+(d+.d+)s+.*$

然后就可以编程实现了, 项目地址: https://github.com/develon2015/CPUWarning

采样174 CPU:100.0       Mem: 1.7
采样175 CPU:100.0       Mem: 1.7
CPU平均使用率为 100.1840909090909 %
CPU 超载 (100.0%), 检查上一次警告时间以确认本次是否发送警报邮件
发送邮件 -- (Sat Oct 19 00:53:28 EDT 2019)
已发送警报邮件至 develon@qq.com : CPU超负荷警告 -> 服务器CPU严重超载(100.1840909090909%), 请管理员立即处理.
top - 00:53:27 up 36 days, 17:58,  5 users,  load average: 0.97, 0.39, 0.15
Tasks: 148 total,   1 running, 147 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.0 sy,  0.0 ni, 99.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  2062096 total,   134600 free,   362592 used,  1564904 buff/cache
KiB Swap:   524284 total,   523508 free,      776 used.  1515476 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
13211 d         20   0 3100664  34784  25480 S 100.0  1.7   2:02.85 java

FROM CPUWarning.
采样0   CPU:106.7       Mem: 1.7
采样1   CPU:106.7       Mem: 3.0999999999999996
采样2   CPU:93.8        Mem: 1.7
采样3   CPU:93.8        Mem: 1.7
采样4   CPU:106.7       Mem: 1.7
采样5   CPU:100.0       Mem: 1.7
采样6   CPU:100.0       Mem: 1.7
采样7   CPU:100.0       Mem: 1.7
采样8   CPU:106.7       Mem: 1.7

...

采样158 CPU:6.7 Mem: 2.9
采样159 CPU:0.0 Mem: 0.0
采样160 CPU:0.0 Mem: 0.0
采样161 CPU:0.0 Mem: 0.0
采样162 CPU:0.0 Mem: 0.0
采样163 CPU:0.0 Mem: 0.0
CPU平均使用率为 36.94268292682926 %
警报解除 -- (Sat Oct 19 00:55:28 EDT 2019)
当前处于安全状态(CPU 0.0 %) -- (Sat Oct 19 00:55:30 EDT 2019)
 top - 00:55:30 up 36 days, 18:00,  5 users,  load average: 0.35, 0.40, 0.18
Tasks: 146 total,   1 running, 145 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.0 sy,  0.0 ni, 99.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  2062096 total,   134324 free,   362808 used,  1564964 buff/cache
KiB Swap:   524284 total,   523508 free,      776 used.  1515260 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND

当前处于安全状态(CPU 0.0 %) -- (Sat Oct 19 00:55:33 EDT 2019)
 top - 00:55:32 up 36 days, 18:00,  5 users,  load average: 0.32, 0.39, 0.18
Tasks: 146 total,   1 running, 145 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.0 sy,  0.0 ni, 99.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  2062096 total,   134324 free,   362808 used,  1564964 buff/cache
KiB Swap:   524284 total,   523508 free,      776 used.  1515260 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND

https://github.com/develon2015/CPUWarning


原文地址:https://www.cnblogs.com/develon/p/11700546.html