nagios原理（三）

4.安装xinetd脚本
[root@dbpi nrpe-2.8.1]# make install-xinetd
输出如下
/usr/bin/install -c -m 644 sample-config/nrpe.xinetd /etc/xinetd.d/nrpe
可以看到创建了这个文件/etc/xinetd.d/nrpe
编辑这个脚本
vi /etc/xinetd.d/nrpe 
# default: on
# description: NRPE (Nagios Remote Plugin Executor)
service nrpe
{
        flags           = REUSE
        socket_type     = stream
        port            = 5666
        wait            = no
        user            = nagios
        group           = nagios
        server          = /usr/local/nagios/bin/nrpe
        server_args     = -c /usr/local/nagios/etc/nrpe.cfg --inetd
        log_on_failure += USERID
        disable         = no
        only_from       = 127.0.0.1在后面增加监控主机的地址0.111,以空格间隔
}
改后
     only_from       = 127.0.0.1 192.168.99.99

编辑/etc/services文件,增加NRPE服务
vi /etc/services 
增加如下
# Local services
nrpe            5666/tcp                        # nrpe
重启xinetd服务
[root@dbpi nrpe-2.8.1]# service xinetd restart
Stopping xinetd: [ OK ]
Starting xinetd: [ OK ]

查看NRPE是否已经启动
[root@dbpi nrpe-2.8.1]# netstat -at|grep nrpe
tcp        0      0 *:nrpe                  *:*                     LISTEN    
[root@dbpi nrpe-2.8.1]# netstat -an|grep 5666
tcp        0      0 0.0.0.0:5666            0.0.0.0:*               LISTEN   
可以看到5666端口已经在监听了

5.测试NRPE是否则正常工作
之前我们在安装了check_nrpe这个插件用于测试,现在就是用的时候.执行
/usr/local/nagios/libexec/check_nrpe -H localhost
会返回当前NRPE的版本
[root@dbpi nrpe-2.8.1]# /usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.8.1
也就是在本地用check_nrpe连接nrpe daemon是正常的
注:为了后面工作的顺利进行,注意本地防火墙要打开5666能让外部的监控机访问

/usr/local/nagios/libexec/check_nrpe –h查看这个命令的用法
可以看到用法是check_nrpe –H 被监控的主机 -c要执行的监控命令
注意:-c后面接的监控命令必须是nrpe.cfg文件中定义的.也就是NRPE daemon只运行nrpe.cfg中所定义的命令

查看NRPE的监控命令
cd /usr/local/nagios/etc
vi nrpe.cfg
找到下面这段话
# The following examples use hardcoded command arguments...
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/hda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
红色部分是命令名,也就是check_nrpe 的-c参数可以接的内容,等号=后面是实际执行的插件程序(这与commands.cfg中定义命令的形式十分相似,只不过是写在了一行).也就是说 check_users就是等号后面/usr/local/nagios/libexec/check_users -w 5 -c 10的简称.
我们可以很容易知道上面这5行定义的命令分别是检测登陆用户数,cpu负载,hda1的容量,僵尸进程,总进程数.各条命令具体的含义见插件用法(执行” 插件程序名 –h”)
由于-c后面只能接nrpe.cfg中定义的命令,也就是说现在我们只能用上面定义的这五条命令.我们可以在本机实验一下.执行
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_users 
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_load 
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_hda1 
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_zombie_procs 
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_total_procs


在运行nagios的监控主机上
之前已经将nagios运行起来了,现在要做的事情是:
– 安装check_nrpe插件
– 在commands.cfg中创建check_nrpe的命令定义,因为只有在commands.cfg中定义过的命令才能在services.cfg中 使用
–      创建对被监控主机的监控项目
安装check_nrpe插件
[root@server1 yahoon]# tar -zxvf nrpe-2.8.1.tar.gz
[root@server1 yahoon]# cd nrpe-2.8.1
[root@server1 nrpe-2.8.1]# ./configure
[root@server1 nrpe-2.8.1]# make all
[root@server1 nrpe-2.8.1]# make install-plugin
只运行这一步就行了,因为只需要check_nrpe插件

在dbpi上我们刚装好了nrpe,现在我们测试一下监控机使用check_nrpe与被监控机运行的nrpedaemon之间的通信.
[root@server1 nrpe-2.8.1]# /usr/local/nagios/libexec/check_nrpe -H 192.168.0.100
NRPE v2.8.1
看到已经正确返回了NRPE的版本信息,说明一切正常.

在commands.cfg中增加对check_nrpe的定义
vi /usr/local/nagios/etc/commands.cfg
在最后面增加如下内容
########################################################################
#
# 2007.9.5 add by yahoon
# NRPE COMMAND
#
########################################################################
# 'check_nrpe ' command definition
define command{
        command_name check_nrpe
        command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
        }
意义如下
command_name check_nrpe 
定义命令名称为check_nrpe,在services.cfg中要使用这个名称. 
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ 
这是定义实际运行的插件程序.这个命令行的书写要完全按照check_nrpe这个命令的用法.不知道用法的就用check_nrpe –h查看
-c后面带的$ARG1$参数是传给nrpe daemon执行的检测命令,之前说过了它必须是nrpe.cfg中所定义的那5条命令中的其中一条.在services.cfg中使用 check_nrpe的时候要用!带上这个参数
下面就可以在services.cfg中定义对dbpi主机cpu负载的监控
define service{
        host_name               dbpi
被监控的主机名,这里注意必须是linux且运行着nrpe,而且必须是hosts.cfg中定义的
        service_description     check-load
        监控项目的名称
        check_command           check_nrpe!check_load
        监控命令是check_nrpe,是在commands.cfg中定义的,带的参数是check_load,是在nrpe.cfg中定义的
        max_check_attempts      5
        normal_check_interval   3
        retry_check_interval    2
        check_period            24x7
        notification_interval   10
        notification_period     24x7
        notification_options    w,u,c,r
        contact_groups          sagroup
        }
像这样将其余四个监控项目加进来.

之前我们说过了,今天还有一个任务是要监控dbpi的swap使用情况.但是很遗憾,在nrpe.cfg中默认没有定义这个监控功能的命令.怎么办?手动 在nrpe.cfg中添加,也就是自定义NRPE命令.
现在我们要监控swap分区,如果空闲空间小于20%则为警告状态—warning;如果小于10%则为严重状态—critical.我们可以查得需要使 用check_swap插件,完整的命令行应该是下面这样.
/usr/local/nagios/libexec/check_swap -w 20% -c 10%

在被监控机上增加check_swap命令的定义
vi /usr/local/nagios/etc/nrpe.cfg
增加下面这一行
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%
我们知道check_swap现在就可以作为check_nrpe的-c的参数使用了
修改了配置文件,当然要重启.但是
如果你是以独立的daemon运行的nrpe,那么需要手动重启.
如果你是在xinetd或者inetd下面运行的,则不需要.
由于我们是xinetd下运行的,所以不需要重启服务

在监控机上增加这个监控项目
define service{
        host_name               dbpi
        service_description     check-swap
        check_command           check_nrpe!check_swap
        max_check_attempts      5
        normal_check_interval   3
        retry_check_interval    2
        check_period            24x7
        notification_interval   10
        notification_period     24x7
        notification_options    w,u,c,r
        contact_groups          sagroup
        }
所有的配置文件已经修改好了,现在重启nagios.杀掉nagios进程,然后再重启.等上一会你就可以看到下面这个画面了

基本上nagios的主要功能就有这些,nagios的使用关键在于如何活用那些丰富的插件.nagios可以说是一个对于linux/unix环境支持 十分好的程序.对于被监控主机是windows系列相关的文章比较少.我就专门花一章来讲述.
有了下一篇,大家就可以功德圆满了.
写到这里,有几个我在安装和使用的几个小知识点,也可以说是小问题附在此处,欢迎大家批评指教.一般的附录都是在文章最后,可下一篇是windows相关 了,与我要说的这几个问题没什么联系正所谓打铁趁热,我就在这里一气呵成,大家也容易看.

附录:
1.重启nagios的方法
之前我说重启nagios的时候都是用的杀进程的方式,其实也可以不这么做.如果在安装nagios的时候安装了启动脚本就可以使用/etc /init.d/nagios restart 还可以带的参数有stop, start,status
如果报错了,有可能是脚本里面的路径设置错误,解决办法
vi /etc/init.d/nagios
将prefix=/usr/local/nagiosaa改为安装的目录/etc/init.d/nagios
注:在nagios安装的时候说是将脚本安装到了/etc/rc.d/init.d,其实这和/etc/init.d是一个目录

2.不以xinetd的方式运行nrpe
因为我们按照nrpe的安装文档安装下来,nrpe是在xinetd下面运行的,个人比较喜欢像nagios那样以单独的daemon来运行.这样比较好 控制.
方法:
编辑 /etc/services将nrpe注释掉
# Local services
#nrpe           5666/tcp                        # nrpe
编辑 nrpe.cfg,增加监控主机的地址
# NOTE: This option is ignored if NRPE is running under either inetd or xinetd
allowed_hosts=127.0.0.1,192.168.99.99
注意两个地址以逗号隔开
以单独的daemon启动nrpe
[root@dbpi etc]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d 
查看
[root@dbpi etc]# ps -ef|grep nrpe
nagios   22125     1 0 14:04 ?        00:00:00 [nrpe]
[root@dbpi nagios]# netstat -an|grep 5666
tcp        0      0 0.0.0.0:5666            0.0.0.0:*               LISTEN   
说明已经正常启动了
在/etc/rc.d/rc.local里面加入下面一行就实现开机启动nrpe了
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg –d
同理要开机运行nagios就在/etc/rc.d/rc.local里面增加下面这行
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

3.有关于check_load的用法及意义
这个插件是用来检测系统当前的cpu负载,使用的方法为
check_load [-r] -w WLOAD1,WLOAD5,WLOAD15 -c CLOAD1,CLOAD5,CLOAD15
在unix里面负载的均值通常表示是1分钟,5分钟,15分钟内平均有多少进程处于等待状态.
例如check_load -w 15,10,5 -c 30,25,20这个命令的意义如下
当1分钟多于15个进程等待,5分钟多于10个,15分钟多于5个则为warning状态
当1分钟多于30个进程等待,5分钟多于25个,15分钟多于20个则为critical状态