Linux8.1 查看系统负载

w命令

使用w查看系统负载，初步判定系统问题点

#load average 一分钟  五分钟  十五分钟   时间段内系统的负载值，单位时间段内，使用CPU的活动的进程数量
#数值为系统逻辑cpu(并非物理cpu)数量为最理想状态,user从哪里登录，从网络登录一般是pts/0...n，本地为tty1...
[root@chyuanliuNJ ~]# w
 19:54:39 up 2 days, 11:28,  4 users,  load average: 0.00, 0.01, 0.05
USER     TTY      FROM             LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    49.77.222.241    19:10   44:39   0.00s  0.00s -bash
root     pts/1    49.77.222.241    19:47    7:11   0.00s  0.00s -bash
root     pts/2    49.77.222.241    19:39   15:27   0.00s  0.00s -bash
root     pts/3    49.77.222.241    19:54    7.00s  0.00s  0.00s w
[root@chyuanliuNJ ~]# date
Sat Nov 25 19:55:10 CST 2017

#每颗物理cpu上有很多逻辑cpu，1颗物理CPU上可以有两核，每核上有好多逻辑CPU
#查看cpu数量，processor为0意味着1颗逻辑cpu，所以为1的时候是最理想状态

[root@chyuanliuNJ ~]# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 79
model name      : Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz
... ...

#w命令第一行与uptime一样
[root@chyuanliuNJ ~]# uptime
 20:06:05 up 2 days, 11:39,  5 users,  load average: 0.00, 0.03, 0.05

vmstat命令

例如通过w命令得知cpu负载过高，可以通过vmstat得知哪些进程占用cpu，查看系统瓶颈。

[root@chyuanliuNJ ~]# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 4  0      0  78048 127452 437872    0    0     2    10   12  196  1  1 98  0  0
#vmstat 数字    动态多少秒显示一次
[root@chyuanliuNJ ~]# vmstat 3
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 4  0      0  78112 127456 437900    0    0     2    10   12  197  1  1 98  0  0
 0  0      0  77740 127460 437908    0    0     0    24 1281 2660  1  1 97  0  0
 0  0      0  77740 127460 437916    0    0     0     9 1268 2657  0  1 99  0  0
^C
#每三秒显示一次，显示4次结束
[root@chyuanliuNJ ~]# vmstat 3 4
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 4  0      0  77732 127476 438056    0    0     2    10   13  198  1  1 98  0  0
 1  0      0  77732 127476 438056    0    0     0     8 1235 2611  1  1 98  0  0
 0  0      0  77764 127476 438088    0    0     0     0 1180 2563  1  0 99  0  0
 0  0      0  77392 127476 438092    0    0     0     0 1191 2582  1  0 99  0  0

# r 为run简写，表示有多少进程处在进行状态，等待使用cpu的进程也算在内
# b 进程等待的数量block
# swap 当内存不太够的时候，系统可以把内存里面一部分数据临时放到swap里，数量在持续变换说明内存不够用了
# free buff cache
#si so   由swap决定，si有多少kb数据从swap进入到内存中，so 就是从内存出来的数据
#bi bo   与磁盘有关系，从磁盘出来进入到内存的数据为bi，bo为写入到磁盘数据
#us  系统上跑的服务占用cpu的比重，会在us中显示。用户态资源占用cpu百分比
#sy  系统本身进程服务占用cpu百分比
#id   空闲，   us + sy + id =100%
#wa  等待，有多少个进程在等待cpu

top命令

查看进程使用资源情况

#三秒更新一次， zombie是主进程已经终止而其子进程还存在叫僵尸进程
#st被偷走的cpu百分比
#物理内存   
#交换分区
#默认按照cpu使用百分比排序。res物理内存大小kb，按下M键可以安装内存使用排序，按P键回到cpu排序
#按下数字1列出所有单个cpu排序
#可以通过PID杀死进程， kill pid
[root@chyuanliuNJ ~]# top
top - 20:33:35 up 2 days, 12:07,  6 users,  load average: 0.00, 0.09, 0.13
Tasks:  83 total,   1 running,  82 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  1016396 total,    74024 free,   375752 used,   566620 buff/cache
KiB Swap:        0 total,        0 free,        0 used.   421600 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 1017 root      20   0 1006112  15304  10308 S  0.3  1.5  23:30.54 staragent-c+
 1047 root      20   0  130484  12840   8848 S  0.3  1.3   3:50.11 AliYunDun
    1 root      20   0   43204   3596   2412 S  0.0  0.4   0:04.26 systemd
    2 root      20   0       0      0      0 S  0.0  0.0   0:00.01 kthreadd
    3 root      20   0       0      0      0 S  0.0  0.0   0:02.72 ksoftirqd/0
    5 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kworker/0:0H
    7 root      rt   0       0      0      0 S  0.0  0.0   0:00.00 migration/0
    8 root      20   0       0      0      0 S  0.0  0.0   0:00.00 rcu_bh
    9 root      20   0       0      0      0 S  0.0  0.0   0:10.18 rcu_sched
   10 root      rt   0       0      0      0 S  0.0  0.0   0:00.51 watchdog/0
   12 root      20   0       0      0      0 S  0.0  0.0   0:00.00 kdevtmpfs
   13 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 netns
   14 root      20   0       0      0      0 S  0.0  0.0   0:00.07 khungtaskd
   15 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 writeback
   16 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kintegrityd
   17 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 bioset
   18 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kblockd

#可以查看具体的命令top -c
[root@chyuanliuNJ ~]# top -c
top - 20:46:40 up 2 days, 12:20,  7 users,  load average: 0.00, 0.01, 0.07
Tasks:  85 total,   1 running,  84 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.0 us,  0.7 sy,  0.0 ni, 98.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  1016396 total,    70888 free,   378232 used,   567276 buff/cache
KiB Swap:        0 total,        0 free,        0 used.   418940 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 1017 root      20   0 1006112  15304  10308 S  1.0  1.5  23:35.66 staragent-core
 1516 root      20   0 2070408  69488  11024 S  0.7  6.8  12:39.07 /usr/local/cloudmoni+
 1047 root      20   0  130484  12840   8848 S  0.3  1.3   3:50.87 /usr/local/aegis/aeg+
11782 root      20   0       0      0      0 S  0.3  0.0   0:00.01 [kworker/0:2]
11909 root      20   0  157600   2216   1616 R  0.3  0.2   0:00.01 top -c
    1 root      20   0   43204   3596   2412 S  0.0  0.4   0:04.27 /usr/lib/systemd/sys+
    2 root      20   0       0      0      0 S  0.0  0.0   0:00.01 [kthreadd]
    3 root      20   0       0      0      0 S  0.0  0.0   0:02.73 [ksoftirqd/0]
    5 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 [kworker/0:0H]
    7 root      rt   0       0      0      0 S  0.0  0.0   0:00.00 [migration/0]
    8 root      20   0       0      0      0 S  0.0  0.0   0:00.00 [rcu_bh]
    9 root      20   0       0      0      0 S  0.0  0.0   0:10.21 [rcu_sched]
   10 root      rt   0       0      0      0 S  0.0  0.0   0:00.51 [watchdog/0]
   12 root      20   0       0      0      0 S  0.0  0.0   0:00.00 [kdevtmpfs]
   13 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 [netns]
   14 root      20   0       0      0      0 S  0.0  0.0   0:00.07 [khungtaskd]
   15 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 [writeback]

#top -bn1     静态一次性显示，适合写脚本使用

sar命令

全面分析系统状态命令，常用于流量监控

#如果sar命令不加任何选项，会自动调用系统保留的历史文件
#每10分钟会把系统状态过滤一遍，保存在文件里，
[root@chyuanliuNJ ~]# sar
Cannot open /var/log/sa/sa25: No such file or directory
#sar历史文件所在目录
[root@chyuanliuNJ ~]# ls /var/log/sa
[root@chyuanliuNJ ~]#
#网卡流量，每一秒显示一次，显示三次
[root@chyuanliuNJ ~]# sar -n DEV 1 3
Linux 3.10.0-514.26.2.el7.x86_64 (chyuanliuNJ)  11/25/2017      _x86_64_        (1 CPU)

08:57:29 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s
08:57:30 PM      eth0      1.01      1.01      0.05      0.12      0.00      0.00      0.00
08:57:30 PM        lo      3.03      3.03      0.18      0.18      0.00      0.00      0.00

08:57:30 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s
08:57:31 PM      eth0      0.00      2.00      0.00      0.56      0.00      0.00      0.00
08:57:31 PM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00

08:57:31 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s
08:57:32 PM      eth0      0.00      1.01      0.00      0.38      0.00      0.00      0.00
08:57:32 PM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00

Average:        IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s
Average:         eth0      0.34      1.34      0.02      0.35      0.00      0.00      0.00
Average:           lo      1.01      1.01      0.06      0.06      0.00      0.00      0.00

#rxpck接受到的数据包 txpck发送    rxkB接受到的数据量    txkB发送
#数据包几千还是比较正常，上万就不太正常。



#查看历史数据，保存一个月的数据，25就是25号的数据
[root@chyuanliuNJ ~]# sar -f /var/log/sa/sa25
Linux 3.10.0-514.26.2.el7.x86_64 (chyuanliuNJ)  11/25/2017      _x86_64_       (1 CPU)

# -q 系统负载，经常查看历史数据，不加数字
[root@chyuanliuNJ ~]# sar -q 1 4
Linux 3.10.0-514.26.2.el7.x86_64 (chyuanliuNJ)  11/25/2017      _x86_64_       (1 CPU)

09:09:04 PM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
09:09:05 PM         1       409      0.03      0.05      0.07         0
09:09:06 PM         1       409      0.03      0.05      0.07         0
09:09:07 PM         3       409      0.03      0.05      0.07         0
09:09:08 PM         1       409      0.03      0.05      0.07         0
Average:            2       409      0.03      0.05      0.07         0
# -b 磁盘读写
[root@chyuanliuNJ ~]# sar -b 1 4
Linux 3.10.0-514.26.2.el7.x86_64 (chyuanliuNJ)  11/25/2017      _x86_64_       (1 CPU)

09:09:57 PM       tps      rtps      wtps   bread/s   bwrtn/s
09:09:58 PM      0.00      0.00      0.00      0.00      0.00
09:09:59 PM      0.00      0.00      0.00      0.00      0.00
09:10:00 PM      0.00      0.00      0.00      0.00      0.00
09:10:01 PM      1.98      0.00      1.98      0.00     47.52
Average:         0.50      0.00      0.50      0.00     12.03

　　/var/log/sa/目录下会有两种文件记录数据，一种sa25另一种sar25会在26号生成，sa25不能使用cat命令只能sar -f 查看，sar25可以使用cat命令查看

nload命令

　　网卡名字在左上角。动态显示网卡流量

Device eth0 [172.16.252.69] (1/2):
================================================================================
Incoming:







                                                       Curr: 856.00 Bit/s
                                                       Avg: 2.05 kBit/s
                                                       Min: 0.00 Bit/s
                                                       Max: 10.12 kBit/s
                                                       Ttl: 102.76 MByte
Outgoing:








                                                       Curr: 8.08 kBit/s
                                                       Avg: 13.55 kBit/s
                                                       Min: 4.34 kBit/s
                                                       Max: 70.05 kBit/s
                                                       Ttl: 206.95 MByte

监控IO性能

　　监控磁盘状态命令，vmstat中若发现b 、wa比较大，就需要看看磁盘性能。

#iostat的数据通过sar -b也能看到
# iostat -x中的 %util很重要
[root@chyuanliuNJ ~]# iostat -x
Linux 3.10.0-514.26.2.el7.x86_64 (chyuanliuNJ)  11/25/2017      _x86_64_        (1 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.01    0.00    0.57    0.03    0.00   98.39

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     1.13    0.06    0.94     2.04     9.97    24.02     0.00    2.02    3.22    1.95   0.43   0.04
%util:一秒中有百分之多少的时间用于I/O操作,或者说一秒中有多少时间I/O队列是非空的。即delta(usr)/s/1000(因为use的单位为毫秒) 

如果%util接近100%,说明产生的I/O请求太多,I/O系统已经满负载,该磁盘可能存在瓶颈。 

%util:一秒中有百分之多少的时间用于I/O操作，或者说一秒中有多少时间I/O队列是非空的 
svctm:平均每次设备I/O操作的服务时间 
await:平均每次设备I/O操作的等待时间 
avgqu-sz:平均I/O队列长度 

如果%util接近100%,表明I/O请求太多,I/O系统已经满负荷，磁盘可能存在瓶颈,一般%util大于70%,I/O压力就比较大，读取速度有较多的wait。 
同时可以结合vmstat查看查看b参数(等待资源的进程数)和wa参数(I/O等待所占用的CPU时间的百分比,高过30%时I/O压力高)  vmstat -d  5
await的大小一般取决于服务时间(svctm)以及I/O队列的长度和I/O请求的发出模式。如果svctm比较接近await,说明I/O几乎没有等待时间;如果 
await远大于svctm,说明I/O队列太长，应用得到的响应时间变慢。

　　查看哪个进程使用IO比较频繁

#类似top命令
[root@chyuanliuNJ ~]# iotop
Total DISK READ :       0.00 B/s | Total DISK WRITE :       0.00 B/s
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
    1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % systemd -~rialize 21
    2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
    3 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
    5 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/0:0H]
    7 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/0]
    8 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_bh]
    9 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_sched]
   10 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/0]
   12 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kdevtmpfs]
   13 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [netns]
   14 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [khungtaskd]
   15 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [writeback]
... ...

free命令

　　查看内存使用情况

#第一行说明，第二行内存，第三行交换分区
[root@chyuanliuNJ ~]# free
              total        used        free      shared  buff/cache   available
Mem:        1016396      360836       63204       62800      592356      419232
Swap:             0           0           0

#free -h  使用合适的单位
#Linux内存把预分配一部分给buff/cache
#buff/cache  缓冲/缓存  
#把数据从磁盘读出交给CPU运算，中间会经过内存[cache]，让读数据读取有个暂缓，随用随取
#cpu把数据计算完存到磁盘，会经过内存[buff]，因为cpu处理传输快，磁盘慢，没有办法一直等磁盘接受数据。先存到缓冲中。

#available 包含free 和 buff/cache剩余部分
#total = used + free + buff/cache

[root@chyuanliuNJ ~]# free -h
              total        used        free      shared  buff/cache   available
Mem:           992M        352M         83M         61M        556M        409M
Swap:            0B          0B          0B

ps命令

　　查看系统进程，当前进程的快照汇报下。

　　linux上进程有5种状态:

    1. 运行(正在运行或在运行队列中等待)   
    2. 中断(休眠中, 受阻, 在等待某个条件的形成或接受到信号)   
    3. 不可中断(收到信号不唤醒和不可运行, 进程必须等待直到有中断发生)   
    4. 僵死(进程已终止, 但进程描述符存在, 直到父进程调用wait4()系统调用后释放)   
    5. 停止(进程收到SIGSTOP, SIGSTP, SIGTIN, SIGTOU信号后停止运行)

　　ps工具标识进程的5种状态码:

    D 不可中断 uninterruptible sleep (usually IO)   
    R 运行 runnable (on run queue)   
    S 中断 sleeping   
    T 停止 traced or stopped   
    Z 僵死 a defunct (”zombie”) process

　　ps aux 及 ps -elf

#ps aux和ps -elf差不多，可以把系统全部进程列出来
#杀死进程，常用kill PID 
#每一个进程都有一个目录/proc/进程号/，可以知道改进程哪里启动

[root@chyuanliuNJ ~]# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.3  43204  3444 ?        Ss   Nov23   0:05 /usr/lib/system
root         2  0.0  0.0      0     0 ?        S    Nov23   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    Nov23   0:03 [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S<   Nov23   0:00 [kworker/0:0H]
root         7  0.0  0.0      0     0 ?        S    Nov23   0:00 [migration/0]
... ...

#USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND   
#USER: 进程拥有者   
#PID: pid   
#%CPU: 占用的 CPU 使用率   
#%MEM: 占用的记忆体使用率   
#VSZ: 占用的虚拟记忆体大小   
#RSS: 占用的记忆体大小   
#TTY: 终端的次要装置号码 (minor device number of tty)   
#STAT: 该行程的状态:   
#D: 不可中断的静止   
#R: 正在执行中   
#S: 静止状态   
#T: 暂停执行   
#Z: 不存在但暂时无法消除，僵尸进程
#W: 没有足够的记忆体分页可分配 
#<: 高优先序的行程   
#N: 低优先序的行程   
#L: 有记忆体分页分配并锁在记忆体内 (即时系统或捱A I/O)   
#START: 行程开始时间   
#TIME: 执行的时间   
#COMMAND:所执行的指令   

[root@chyuanliuNJ ~]# ps aux |grep nginx
root      1375  0.0  0.0 112644   968 pts/0    R+   20:59   0:00 grep --color=auto ngin

正在运行的进程，在ps aux中显示的不一定为running状态，ps aux为快照一瞬间系统进程运行，如果某进程运行过快，并不一定能显示为r状态，

比如在另一个终端运行vmstat命令，在当前终端的ps aux中，其状态为S静止状态，原因就是vmstat运行特别快，时间极短。

　ps和top区别

Ps 只为您提供当前进程的快照。要即时查看最活跃的进程，可使用 top。
Top 实时地提供进程信息。它还拥有交互式的状态，允许用户输入命令，如 n 后面跟有 5 或 10 等数字。其结果是指示 top 显示 5 或 10 个最活跃的进程。 Top 持续运行，直到您按 "q" 退出 top 为止。