centos性能监控系列二：Collectl初解

对于一个 Linux 系统管理员来说确保自己管理的系统处于一个良好的状态是其首要责任。

Linux 系统管理员能够找到有非常多工具来帮助自己监控和显示系统中的进程。比如 top 和 htop

今天介绍一款工具collectl，使用比較方便；

介绍：

collectl是一款很优秀而且有着丰富的命令行功能的有用程序，你能够用它来採集描写叙述当前系统状态的性能数据。不同于大多数其他的系统监控工具。collectl 并不是仅局限于有限的系统度量，相反，它能够收集很多不同类型系统资源的相关信息，如 cpu 、disk、memory 、network 、sockets 、 tcp 、inodes 、infiniband 、 lustre 、memory、nfs、processes、quadrics、slabs和buddyinfo等。

使用 collectl 的还有一个优点就是它能够替代那些特定用途的工具如： top、ps、iotop 等等其他工具。那么 collectl 有什么特性而使其成为一个实用的工具呢？

Collectl 特性

能够交互式地执行或作为一个守护进程。或同一时候二者兼备地执行。
能够以多种格式显示输出。
能够监控差点儿全部的子系统。
能够替代很多工具如 ps、top、iotop、vmstat。
能够记录并回放捕获的数据。
能够将数据导出成多种数据格式。
（这在你想用外部工具分析数据时很实用）
能够作为一个服务来监控远程机或者整个server集群。
能够在终端显示数据。写入数据到文件或者一个套接字。

怎样在Linux上安装collectl

collectl能够在全部的 Linux 发行版上执行，唯一须要的就是 perl 语言，所以在安装 collectl 之前，一定要确保你的电脑上已经安装了Perl。

对于Debian/Ubuntu/Linux Mint

以下的命令能够用来在以 Debian 为基础的设备如 Ubuntu 上安装 collectl。

$ sudo apt-get install collectl

对于RHEL/CentOS/Fedora

假设你正在使用基于红帽的发行版。你能够用 yum 命令轻松获取它。

# yum install collectl

一些关于collectl的实例

collectl 工具安装完毕之后。你能够轻松地在终端执行它。你甚至不须要指定不论什么选项。

以下的命令将会以简短易读的格式显示cpu、硬盘和网络信息。

# collectl
 
waiting for 1 second sample...
#
#cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut 
  13   5   790   1322      0      0     92      7      4     13      0       5 
  10   2   719   1186      0      0      0      0      3      9      0       4 
  12   0   753   1188      0      0     52      3      2      5      0       6 
  13   2   733   1063      0      0      0      0      1      1      0       1 
  25   2   834   1375      0      0      0      0      1      1      0       1 
  28   2   870   1424      0      0     36      7      1      1      0       1 
  19   3   949   2271      0      0     44      3      1      1      0       1 
  17   2   809   1384      0      0      0      0      1      6      0       6 
  16   2   732   1348      0      0      0      0      1      1      0       1 
  22   4   993   1615      0      0     56      3      1      2      0       3

正如上面终端上所显示的，我们非常easy观察该命令输出的系统度量值，由于它每次以一行显示。

不加不论什么參数运行 collectl 会显示以下子系统的信息

cpu
磁盘
网络

可是。你怎样用它来监控 cpu 的使用情况呢？ ‘-s’ 选项能够用来控制哪个子系统的数据须要收集和回放。

比如以下的命令能够用来对cpu使用情况进行一个总结。

# collectl -sc
 
waiting for 1 second sample...
#
#cpu sys inter  ctxsw 
  15   2   749   1155 
  16   3   772   1445 
  14   2   793   1247 
  27   4   887   1292 
  24   1   796   1258 
  16   1   743   1113 
  15   1   743   1179 
  14   1   706   1078 
  15   1   764   1268

当你想要进一步了解内存使用信息、空暇的内存或者与你系统性能有关的重要资料时，上面的输出将是很实用的。

假设想搜集一点儿 tcp 的数据呢？使用以下的命令来实现吧。

# collectl -st
 
waiting for 1 second sample...
#
#  IP  Tcp  Udp Icmp 
    0    0    0    0 
    0    0    0    0 
    0    0    0    0 
    0    0    0    0 
    0    0    0    0 
    0    0    0    0 
    0    0    0    0 
    0    0    0    0 
    0    0    0    0 
    0    0    0    0 
    0    0    0    0

当你熟练到一定程度时。你就能够非常轻松地得到你想要的结果了。比如你能够将关于 tcp 的“t”选项和关于 cpu 的“c”选项组合到一起。

以下的命令就是如此。

# collectl -stc
 
waiting for 1 second sample...
#
#cpu sys inter  ctxsw   IP  Tcp  Udp Icmp 
  23   8   961   3136    0    0    0    0 
  24   5   916   3662    0    0    0    0 
  21   8   848   2408    0    0    0    0 
  30  10   916   2674    0    0    0    0 
  38   3   826   1752    0    0    0    0 
  31   3   820   1408    0    0    0    0 
  15   5   781   1335    0    0    0    0 
  17   3   802   1314    0    0    0    0 
  17   3   755   1218    0    0    0    0 
  14   2   788   1321    0    0    0    0

以上简单举几个样例；

对于我们普通大众来说记住这些选项非常困难。所以在这里，我整理出了一个列表来总结这个工具支持的选项。

b – buddy info (内存碎片)
c – CPU
d – Disk
f – NFS V3 Data
i – Inode and File System
j – Interrupts
l – Lustre
m – Memory
n – Networks
s – Sockets
t – TCP
x – Interconnect
y – Slabs (系统对象缓存)

对于一个系统管理员或者一个 Linux 用户来说非常重要的一种数据就是硬盘的使用情况。

以下的命令能够帮你监控硬盘使用情况。

# collectl -sd
 
waiting for 1 second sample...
#
#KBRead  Reads KBWrit Writes 
      0      0      0      0 
      0      0      0      0 
      0      0     92      7 
      0      0      0      0 
      0      0     36      3 
      0      0      0      0 
      0      0      0      0 
      0      0    100      7 
      0      0      0      0

你也能够使用“-sD”选项来採集单个硬盘的数据，只是你必须知道这就不会显示所有硬盘的信息。

# collectl -sD
 
waiting for 1 second sample...
 
# DISK STATISTICS (/sec)
#           Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
sda              0      0    0    0      52     11    2   26      26     1     8      8    1
sda              0      0    0    0       0      0    0    0       0     0     0      0    0
sda              0      0    0    0      24      0    2   12      12     0     0      0    0
sda              0      0    0    0     152      0    4   38      38     0     0      0    0
sda              0      0    0    0     192     45    3   64      64     1    20     20    5
sda              0      0    0    0     204      0    2  102     102     0     0      0    0
sda              0      0    0    0       0      0    0    0       0     0     0      0    0
sda              0      0    0    0     116     26    3   39      38     1    16     16    4
sda              0      0    0    0       0      0    0    0       0     0     0      0    0
sda              0      0    0    0       0      0    0    0       0     0     0      0    0
sda              0      0    0    0      32      5    3   11      10     1    16     16    4
sda              0      0    0    0       0      0    0    0       0     0     0      0    0

你也能够使用其他具体的子系统来採集具体的数据。

以下是具体子系统的一个列表。

C – CPU
D – Disk
E – Environmental data (fan, power, temp), via ipmitool
F – NFS Data
J – Interrupts
L – Lustre OST detail OR client Filesystem detail
N – Networks
T – 65 TCP counters only available in plot format
X – Interconnect
Y – Slabs (system object caches)
Z – Processes

collectl 工具中有很多选项，可是仅用一篇文章来介绍肯定是介绍只是来的。

然而假设将它当作 top 和 ps 工具来使用还是值得一提的。

非常easy将 collectl 当作 top 来使用，仅仅要在 Linux 系统的终端执行以下的命令你就会看到和 top 工具类似的输出。

# collectl --top
 
# TOP PROCESSES sorted by time (counters are /sec) 13:11:02
# PID  User     PR  PPID THRD S   VSZ   RSS CP  SysT  UsrT Pct  AccuTime  RKB  WKB MajF MinF Command
^COuch!tecmint  20     1   40 R    1G  626M  0  0.01  0.14  15  28:48.24    0    0    0  109 /usr/lib/firefox/firefox 
 3403  tecmint  20     1   40 R    1G  626M  1  0.00  0.20  20  28:48.44    0    0    0  600 /usr/lib/firefox/firefox 
 5851  tecmint  20  4666    0 R   17M   13M  0  0.02  0.06   8  00:01.28    0    0    0    0 /usr/bin/perl 
 1682  root     20  1666    2 R  211M   55M  1  0.02  0.01   3  03:10.24    0    0    0   95 /usr/bin/X 
 3454  tecmint  20  3403    8 S  216M   45M  1  0.01  0.02   3  01:23.32    0    0    0    0 /usr/lib/firefox/plugin-container 
 4658  tecmint  20  4657    3 S  207M   17M  1  0.00  0.02   2  00:08.23    0    0    0  142 gnome-terminal 
 2890  tecmint  20  2571    3 S  340M   68M  0  0.00  0.01   1  01:19.95    0    0    0    0 compiz 
 3521  tecmint  20     1   24 S  710M  148M  1  0.01  0.00   1  01:47.84    0    0    0    0 skype 
    1  root     20     0    0 S    3M    2M  0  0.00  0.00   0  00:02.57    0    0    0    0 /sbin/init 
    2  root     20     0    0 S     0     0  1  0.00  0.00   0  00:00.00    0    0    0    0 kthreadd 
    3  root     20     2    0 S     0     0  0  0.00  0.00   0  00:00.60    0    0    0    0 ksoftirqd/0 
    5  root      0     2    0 S     0     0  0  0.00  0.00   0  00:00.00    0    0    0    0 kworker/0:0H 
    7  root      0     2    0 S     0     0  0  0.00  0.00   0  00:00.00    0    0    0    0 kworker/u:0H 
    8  root     RT     2    0 S     0     0  0  0.00  0.00   0  00:04.42    0    0    0    0 migration/0 
    9  root     20     2    0 S     0     0  0  0.00  0.00   0  00:00.00    0    0    0    0 rcu_bh 
   10  root     20     2    0 R     0     0  0  0.00  0.00   0  00:02.22    0    0    0    0 rcu_sched 
   11  root     RT     2    0 S     0     0  0  0.00  0.00   0  00:00.05    0    0    0    0 watchdog/0 
   12  root     RT     2    0 S     0     0  1  0.00  0.00   0  00:00.07    0    0    0    0 watchdog/1 
   13  root     20     2    0 S     0     0  1  0.00  0.00   0  00:00.73    0    0    0    0 ksoftirqd/1 
   14  root     RT     2    0 S     0     0  1  0.00  0.00   0  00:01.96    0    0    0    0 migration/1 
   16  root      0     2    0 S     0     0  1  0.00  0.00   0  00:00.00    0    0    0    0 kworker/1:0H 
   17  root      0     2    0 S     0     0  1  0.00  0.00   0  00:00.00    0    0    0    0 cpuset

还有不了解的额，在你的终端键入以下的命令開始阅读吧。

# man collectl