nagios云监控

(注:以下主要包括nagios安装,nagois配置,nagios对redis监控,nagios对mysql监控,nagios对zookeeper监控)

Nagios不但能够实现对系统CPU,磁盘、网络等方面参数的基本系统监测,而且还能够监测包括SMTP,POP3,HTTP,NNTP等各种基本的服务类型。另外通过一些插件的安装和监测脚本自定义用户可以针对自己的应用程序实现监测,并针对大量的监测主机和多个对象部署层次化的监测架构。

一、nagios安装

Nagios主节点需要安装:

  • nagios

  • nagios-plugin

  • nrpe

  • php

  • apache

Nagios从节点需要安装:

  • nagios-plugin

  • nrpe

NRPE说明:

  • NRPE外部构件监测远程主机。NRPE外部构件可以在远程的Linux/Unix主机上执行插件程序。如果是要象监测本地主机一样对远程主机的磁盘利用率、CPU负荷和内存占用率等情况下,NRPE外部构件将非常有用。

  • 提到“外部构件”这个概念的时候需要说明一下,Nagios有许多"外部构件"软件包可供使用。外部构件可以扩展Nagios的应用并使之与其他软件集成,而且能够通过WEB接口来实现管理配置文件,监测远程主机(*NIX,Windows等),对远程主机的强制监测,减化并扩展告警逻辑等功能。

  • NRPE是一个可在远程Linux/Unix主机上执行的插件的外部构件包。如果你需要监测远程的主机上的本地资源或属性,如磁盘利用率、CPU负荷、内存利用率等时是很有用的。最终效果和用check_by_ssh插件来实现的功能一样,但是他不需要占用更多的监测主机的CPU负荷,所以当你需要监测大量的主机时这个构件将起到很重要的作用(如图pic35.png所示)。

  • 通过该图可以看出,我们需要在被监测主机上部署NRPE,他相当于一个守护进程负责监听。而监测主机使用check_nrpe并通过SSL连接访问这个daemon,然后调用被监测方的check_disk,check_load等脚本获取信息并将结果传递到监测主机。同时这些脚本也有能力监测到其他主机的相关信息。

主机安装环境检查(全部节点)

1
2
3
4
5
6
7
8
# rpm -q gcc glibc glibc-common gd gd-devel xinetd openssl-devel
gcc-4.4.7-3.el6.x86_64
glibc-2.14.1-6.x86_64
glibc-common-2.14.1-6.x86_64
gd-2.0.35-11.el6.x86_64
package gd-devel is not installed
package xinetd is not installed
openssl-devel-1.0.0-27.el6.x86_64

若有缺失,请先安装. 可通过如下几个镜像网站下载相关安装包:

  • http://rpm.pbone.net/

  • http://mirrors.163.com/centos/6.4/os/x86_64/Packages/

  • http://mirrors.sohu.com/centos/6.4/os/x86_64/Packages/

创建nagios用户

useradd nagios -d /usr/local/nagios
passwd nagios   (密码自定义)

主节点安装

  一、nagios(下载:http://jaist.dl.sourceforge.net/project/nagios/nagios-4.x/nagios-4.0.2/nagios-4.0.2.tar.gz)

          1、安装

    tar -zxf nagios-4.0.2.tar.gz
    cd nagios-4.0.2
    ./configure --prefix=/usr/local/nagios     
    make all
    make install && make install-init && make install-commandmode && make install-config

        2、将nagios添加为服务

    chkconfig --add nagios 
    chkconfig nagios off
    chkconfig --level 35 nagios on
    chkconfig --list nagios    
    nagios          0:关闭  1:关闭  2:关闭  3:启用  4:关闭  5:启用  6:关闭

  二、nagios插件(下载https://www.nagios-plugins.org/download/nagios-plugins-1.5.tar.gz)    

    tar -zxf nagios-plugins-1.5.tar.gz
    cd nagios-plugins-1.5
    ./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios      
    make && make install

        如果出现mysql相关的编译错误,是mysql的默认安装路径被修改导致的,调整with-mysql后重新make

    ./configure --prefix=/usr/local/nagios  --with-mysql=/usr/local/mysql
    make && make install

  三、NRPE(下载http://jaist.dl.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.15/nrpe-2.15.tar.gz)

    tar -zxf nrpe-2.15.tar.gz
    cd nrpe-2.15
    ./configure --enable-command-args
    make all
    make install-plugin

  被监控节点需要执行  make install-daemon && make install-daemon-config && make install-xinetd

  四、Apache(下载http://archive.apache.org/dist/httpd/httpd-2.2.23.tar.gz)

    tar -zxf httpd-2.2.23.tar.gz
    cd httpd-2.2.23
    ./configure --prefix=/usr/local/apache2
    make && make install

  五、PHP(下载http://cn2.php.net/distributions/php-5.4.10.tar.gz)   

    cd /export/home/tools/soft/php
    tar -zxf php-5.4.10.tar.gz
    cd /php-5.4.10
    ./configure --prefix=/usr/local/php  --with-apxs2=/usr/local/apache2/bin/apxs
    make  && make install

从节点安装

    从借点安装上面二、三两部分就可以

二、Nagios配置

一、被监控节点配置(主从联系配置):

        1、更改/etc/xinetd.d/nrpe文件,设置允许nagios主节点服务器连接

    vi /etc/xinetd.d/nrpe
    only_from       = 127.0.0.1 主节点IP

        2、在/etc/services结尾增加:

        nrpe      5666/tcp       # NRPE

        3、增加对参数的支持 

     vi /usr/local/nagios/etc/nrpe.cfg
     dont_blame_nrpe=1

        4、启动xinetd

            service xinetd restart

        5、验证nrpe是否监听

            netstat -at | grep nrpe

        6、测试nrpe是否正常运行

     /usr/local/nagios/libexec/check_nrpe -H localhost
     NRPE v2.15

        7、主节点测试

     /usr/local/nagios/libexec/check_nrpe -H 配置从节点的IP,返回版本信息表示成功

   二、被监控节点命令配置:

        1、修改配置文件

        # su - nagios
        $ vi /usr/local/nagios/etc/nrpe.cfg

            修改为:

    command[check_users]=/usr/local/nagios/libexec/check_users -w $ARG1$ -c $ARG2$
    command[check_load]=/usr/local/nagios/libexec/check_load -w $ARG1$ -c $ARG2$
    command[check_disk]=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
    command[check_procs]=/usr/local/nagios/libexec/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
    command[check_procs_args]=/usr/local/nagios/libexec/check_procs  $ARG1$
    command[check_swap]=/usr/local/nagios/libexec/check_swap -w $ARG1$ -c $ARG2$
    • check_users                    监控登陆用户数

    • check_load                     监控CPU负载

    • check_disk                      监控磁盘的使用

    • check_procs                   监控进程数量,状态包括 RSZDT

    • check_swap                    监控SWAP分区使用

      2、检查监控命令配置是否ok

      service xinetd restart

      /usr/local/nagios/libexec/check_nrpe -H localhost -c check_users  -a 5 10
      /usr/local/nagios/libexec/check_nrpe -H localhost -c check_load   -a 15,10,5 30,25,20
      /usr/local/nagios/libexec/check_nrpe -H localhost -c check_disk    -a 20% 10% /
      /usr/local/nagios/libexec/check_nrpe -H localhost -c check_procs -a 200 400 RSZDT
      /usr/local/nagios/libexec/check_nrpe -H localhost -c check_swap  -a 20% 10%

三、主节点配置(主从联系配置):

1、定义权限

(使用 nagios 用户)

vi /usr/local/nagios/etc/cgi.cfg

修改如下内容,为admin用户增加权限:

1
2
3
4
5
6
7
8
default_user_name=admin
authorized_for_system_information=nagiosadmin,admin
authorized_for_configuration_information=nagiosadmin,admin
authorized_for_system_commands=nagiosadmin,admin
authorized_for_all_services=nagiosadmin,admin
authorized_for_all_hosts=nagiosadmin,admin
authorized_for_all_service_commands=nagiosadmin,admin
authorized_for_all_host_commands=nagiosadmin,admin

   2、nagios.cfg

vi /usr/local/nagios/etc/nagios.cfg

1
2
#cfg_file=/export/home/nagios/etc/objects/localhost.cfg      (注释掉)
cfg_dir=/export/home/nagios/etc/servers

主配置文件声明了监控脚本的存储路径为 ./servers, 默认没有此目录,需要手工创建

nagios 会读取 servers 目录下面后缀为.cfg的全部文件作为配置文件

1
2
3
cd /usr/local/nagios/etc
mkdir servers
cd servers

 3、定义监控组

声明一个监控的主机组,将主机环境中提到的三台主机全部加入监控

vi /export/home/nagios/etc/servers/group.cfg

新文件,内容如下:

1
2
3
4
5
define hostgroup{
   hostgroup_name      name
   alias               name
   members             name1,name2,name3
}

解释下上面的配置:

  • hostgroup_name:    主机组的名称,可随意指定

  • alias:                        主机组别名,可随意指定

  • members:                主机组成员,多个主机名称之前使用逗号分隔.另外主机名称必须与 define host 中host_name 一致.

4、定义监控主机

    先定义本地主机 主机-1

    vi /export/home/nagios/etc/servers/主机-1.cfg

define host{
       use                          linux-server
       host_name                    主机-1
       alias                        主机-1
       address                      192.168.56.10
       }
define service{
       use                             local-service
       host_name                       主机-1
       service_description             Host Alive
       check_command                   check-host-alive
       }
define service{
       use                             local-service
       host_name                       主机-1
       service_description             Users
       check_command                   check_local_users!20!50
       }

由于是此主机也是监控服务主节点所在主机,因此可以使用check_local_* 的相关命令来进行监控.

这个文件中已经将常用的监控项配置进去.

再定义远程主机主机2和主机-3

定义远程主机的监控之前,需要先定义check_nrpe命令

vi /usr/local/nagios/etc/objects/commands.cfg

在文件的最后面添加如下内容:

1
2
3
4
5
6
7
8
9
# 'check_nrpe' command definition
define command{
       command_name    check_nrpe
       command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$
       }
define command{
       command_name    check_nrpe_args
       command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$ -a $ARG2$
       }

下面的配置文件定义同上

 5、定义邮件收件人

定义监控人邮件地址

vi /usr/local/nagios/etc/objects/contacts.cfg

1
2
3
4
5
6
7
define contact{
       contact_name                    nagiosadmin             ; Short name of user
       use                             generic-contact         ; Inherit default values from generic-contact template (defined above)
       alias                           Nagios Admin            ; Full name of user
       email                           yourname@domain.com 
                                                               ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
       }

除了配置监控邮件的接收人外,还要确保:

  • 本主机与邮件服务器互通

  • 本主机SendMail可以使用外部SMTP服务发送邮件

三、对redis的监控

首先安装:yum info perl5 yum install perl-Time-HiRes

1、下载check_redis.pl插件,放入libexec

2、etc/objects/commands.cfg加入:

                # check redis

                define command {

                    command_name    check_redis

                    command_line    $USER1$/check_redis.pl -H $HOSTADDRESS$ -p $ARG1$ -a $ARG2$ -w $ARG3$ -c $ARG4$ -f

                   }

        3、监听配置文件加入

            define service {

                  use                     local-service

                  service_description     描述名称

                  check_command        命令(如下)

                  host_name              主机名/IP

}

 check_redis!端口!'监听内容(逗号隔开)'!(报警阀值)!(报警阀值) ;

监听内容参数翻译如下:   

--total_connections_received=WARN:threshold,CRIT:threshold,<other specifiers>

   Total Connections Received 收到总连接数

 --total_connections_received_rate=WARN:threshold,CRIT:threshold,<other specifiers>

   Rate of Change of Total Connections Received 总共收到的连接率

 --total_expires=WARN:threshold,CRIT:threshold,<other specifiers>

   Number of Expired Keys for All DBs  dbs总过期密钥

 --used_memory_rss=WARN:threshold,CRIT:threshold,<other specifiers>

   Resident Set Size, Used Memory in Bytes

 --used_cpu_sys=WARN:threshold,CRIT:threshold,<other specifiers>

   Main Process Used System CPU CPU使用率

 --redis_git_dirty=WARN:threshold,CRIT:threshold,<other specifiers>

   Git Dirty Set Bit  脏数据

 --connected_clients=WARN:threshold,CRIT:threshold,<other specifiers>

   Total Number of Connected Clients 总连接数

 --uptime_in_days=WARN:threshold,CRIT:threshold,<other specifiers>

   Total Uptime in Days 总运行天数

 --uptime_in_days_rate=WARN:threshold,CRIT:threshold,<other specifiers>

   Rate of Change of Total Uptime in Days  总运行时间的变化率

 --keyspace_hits=WARN:threshold,CRIT:threshold,<other specifiers>

   Total Keyspace Hits 

 --keyspace_hits_rate=WARN:threshold,CRIT:threshold,<other specifiers>

   Rate of Change of Total Keyspace Hits

 --pubsub_channels=WARN:threshold,CRIT:threshold,<other specifiers>

   Number of Pubsub Channels  Pubsub通道数量

 --used_cpu_user_children=WARN:threshold,CRIT:threshold,<other specifiers>

   Child Processes Used User CPU  子进程用户CPU使用

 --keyspace_misses=WARN:threshold,CRIT:threshold,<other specifiers>

   Keyspace Misses

 --keyspace_misses_rate=WARN:threshold,CRIT:threshold,<other specifiers>

   Rate of Change of Keyspace Misses

 --used_cpu_user=WARN:threshold,CRIT:threshold,<other specifiers>

   Main Process Used User CPU

 --total_commands_processed=WARN:threshold,CRIT:threshold,<other specifiers>

   Total Number of Commands Processed from Start  从开始处理的命令总数量

 --total_commands_processed_rate=WARN:threshold,CRIT:threshold,<other specifiers>

   Rate of Change of Total Number of Commands Processed from Start

 --mem_fragmentation_ratio=WARN:threshold,CRIT:threshold,<other specifiers>

   Memory Fragmentation Ratio 记忆碎片比率

 --blocked_clients=WARN:threshold,CRIT:threshold,<other specifiers>

   Number of Currently Blocked Clients  目前阻止客户的数量

 --evicted_keys=WARN:threshold,CRIT:threshold,<other specifiers>

   Total Number of Evicted Keys  驱逐总数

 --evicted_keys_rate=WARN:threshold,CRIT:threshold,<other specifiers>

   Rate of Change of Total Number of Evicted Keys驱逐率

 --total_keys=WARN:threshold,CRIT:threshold,<other specifiers>

   Total Number of Keys on the Server

 --expired_keys=WARN:threshold,CRIT:threshold,<other specifiers>

   Total Number of Expired Keys

 --expired_keys_rate=WARN:threshold,CRIT:threshold,<other specifiers>

   Rate of Change of Total Number of Expired Keys

 --connected_slaves=WARN:threshold,CRIT:threshold,<other specifiers>

   Number of Connected Slaves

 --used_cpu_sys_children=WARN:threshold,CRIT:threshold,<other specifiers>

   Child Processed Used System CPU

四、对mysql的监控

三个插件:check_mysql/check_mysqld.pl/check_mysql_health,check_mysql_health比较完善,选取check_mysql_health;

check_mysql_health用法:

下载地址 https://labs.consol.de/nagios/check_mysql_health/

使用前提安装:yum -y install perl-DBD-MySQL

1、下载check_mysql_health-2.1.tar.gz

2、解压tar -zxvf check_mysql_health-2.1.tar.gz

3、安装

#./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios --with-        perl=/usr/bin/perl
#make && make install

4、命令测试:

   ./check_mysql_health --hostname 192.168.0.1 --port 3306 --username myname --password mypassword --mode threads-connected --warning 700 --critical 1000

5、etc/objects/commands.cfg添加:

        # check mysql health

         define command {

            command_name    check_mysql_health

            command_line    $USER1$/check_mysql_health --hostname $ARG1$ --port $ARG2$ --username $ARG3$ --password $ARG4$ --mode $ARG5$ --warning $ARG6$ --critical $ARG7$

        }

      6、监控配置文件配置(同上)

        监控参数:

    connection-time          (Time to connect to the server)
       uptime                   (Time the server is running)
       threads-connected        (Number of currently open connections)线程数
       threadcache-hitrate      (Hit rate of the thread-cache)慢查询
       slave-lag                (Seconds behind master)
       slave-io-running         (Slave io running: Yes)主从热备
       slave-sql-running        (Slave sql running: Yes)主从热备
       qcache-hitrate           (Query cache hitrate)
       qcache-lowmem-prunes     (Query cache entries pruned because of low memory)
       keycache-hitrate         (MyISAM key cache hitrate)
       bufferpool-hitrate       (InnoDB buffer pool hitrate)
       bufferpool-wait-free     (InnoDB buffer pool waits for clean page available)
       log-waits                (InnoDB log waits because of a too small log buffer)
       tablecache-hitrate       (Table cache hitrate)
       table-lock-contention    (Table lock contention)锁表率
       index-usage              (Usage of indices)
       tmp-disk-tables          (Percent of temp tables created on disk)
       slow-queries             (Slow queries)
       long-running-procs       (long running processes)
       cluster-ndbd-running     (ndnd nodes are up and running)
       sql                      (any sql command returning a single number)

    7、/etc/init.d/nagios restart 重启nagios,若报进程被锁 则需要删除/var/lock/subsys/nagios

五、对zookeeper的监控

一、安装插件

git clone https://github.com/harisekhon/nagios-plugins
cd nagios-plugins
make

二、插件说明

1、etc/objects/commands.cfg添加:

        # check zk

         define command {

            command_name    check_zk

            command_line    /exeport/home/nagios/nagios_plugins/check_zookeeper.pl -H $ARG1$ 

        }

      2、service中配置监控信息

   注:若出现权限不够,需要修改权限为可执行

 

 

        

     

原文地址:https://www.cnblogs.com/guoliangxie/p/5392020.html