Nagios监控Windows服务器（NSClient++安装与应用）

NSClient++安装
下载NSClient++插件
点击下面链接下载http://files.nsclient.org/x-0.3.x/NSClient%2B%2B-0.3.9-Win32.msi
安装插件与配置
安装过程十分简单，直接点击下一步，下一步即可。安装过程注意如下图的设置即可

配置NSClient
编辑NSClient安装目录下的NSC.ini文件，修改后，重启NSClient即可，如下图所示

模块作用说明，如下表
Module   Description   Commands
CheckSystem.dll   Handles many system checks   CPU, MEMORY, COUNTER etc
CheckDisk.dll   Handles Disk related checks   USEDDISKSPACE
FileLogger.dll   Logs errors to a file so you can see what is going on   N/A
NSClientListener.dll   Listens and responds to incoming requests from nagios   N/A

NSClient应用监控
NSClient++与Nagios服务器通信，主要使用Nagios服务器的check_nt插件。原理图如下

check_nt插件的使用说明
[root@localhost libexec]# ./check_nt -h
check_nt v1.4.15 (nagios-plugins 1.4.15)
Copyright (c) 2000 Yves Rubin (rubiyz@yahoo.com)
Copyright (c) 2000-2007 Nagios Plugin Development Team
        <nagiosplug-devel@lists.sourceforge.net>
This plugin collects data from the NSClient service running on a
Windows NT/2000/XP/2003 server.

Usage:
check_nt -H host -v variable [-p port] [-w warning] [-c critical]
[-l params] [-d SHOWALL] [-u] [-t timeout]

Options:
-h, --help
    Print detailed help screen
-V, --version
    Print version information
Options:
-H, --hostname=HOST
   Name of the host to check
-p, --port=INTEGER
   Optional port number (default: 1248)
-s, --secret=<password>
   Password needed for the request
-w, --warning=INTEGER
   Threshold which will result in a warning status
-c, --critical=INTEGER
   Threshold which will result in a critical status
-t, --timeout=INTEGER
   Seconds before connection attempt times out (default: -l, --params=<parameters>
   Parameters passed to specified check (see below) -d, --display={SHOWALL}
   Display options (currently only SHOWALL works) -u, --unknown-timeout
   Return UNKNOWN on timeouts10)
-h, --help
   Print this help screen
-V, --version
   Print version information
-v, --variable=STRING
   Variable to check

Valid variables are:
CLIENTVERSION = Get the NSClient version
If -l <version> is specified, will return warning if versions differ.
CPULOAD =
Average CPU load on last x minutes.
Request a -l parameter with the following syntax:
-l <minutes range>,<warning threshold>,<critical threshold>.
<minute range> should be less than 24*60.
Thresholds are percentage and up to 10 requests can be done in one shot.
ie: -l 60,90,95,120,90,95
UPTIME =
Get the uptime of the machine.
No specific parameters. No warning or critical threshold
USEDDISKSPACE =
Size and percentage of disk use.
Request a -l parameter containing the drive letter only.
Warning and critical thresholds can be specified with -w and -c.
MEMUSE =
Memory use.
Warning and critical thresholds can be specified with -w and -c.
SERVICESTATE =
Check the state of one or several services.
Request a -l parameters with the following syntax:
-l <service1>,<service2>,<service3>,...
You can specify -d SHOWALL in case you want to see working services
in the returned string.
PROCSTATE =
Check if one or several process are running.
Same syntax as SERVICESTATE.
COUNTER =
Check any performance counter of Windows NT/2000.
        Request a -l parameters with the following syntax:
        -l "\\<performance object>\\counter","<description>
        The <description> parameter is optional and is given to a printf
output command which requires a float parameter.
If <description> does not include "%%", it is used as a label.
Some examples:
"Paging file usage is %%.2f %%%%"
"%%.f %%%% paging file used."
INSTANCES =
Check any performance counter object of Windows NT/2000.
Syntax: check_nt -H <hostname> -p <port> -v INSTANCES -l <counter object>
<counter object> is a Windows Perfmon Counter object (eg. Process),
if it is two words, it should be enclosed in quotes
The returned results will be a comma-separated list of instances on
   the selected computer for that object.
The purpose of this is to be run from command line to determine what instances
   are available for monitoring without having to log onto the Windows server
    to run Perfmon directly.
It can also be used in scripts that automatically create Nagios service
   configuration files.
Some examples:
check_nt -H 192.168.1.1 -p 1248 -v INSTANCES -l Process

检查Nagios目录下的libexec子目录，一定要存在check_nt
（例如：/usr/local/nagios/libexec/check_nt）

查看Nagios服务器下定义check_nt命令
[root@localhost etc]# vim commands.cfg
define command {
command_name   check_nt
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$
register          1
}

Windows系统监控应用
首先定义一个需要监控的主机，本例为监控Winxp226，命令如下：
define host {
        host_name                       winxp226
        alias                             My Windows Server
        address                          10.0.192.226
        use                              windows-server,host-pnp
        register                           1
}

监控 windows 主机的 CPU 负载
define service {
        host_name                       winxp226
        service_description                cpuload
        use                              generic-service
        check_command                  check_nt!CPULOAD!-l 5,70,80,10,80,90
        register                           1
}
注：#CPU如果到达80%则报警，到达90%则警笛

监控 windows 主机的内存使用状况
define service {
        host_name                       winxp226
        service_description                Memory Usage
        use                              generic-service
        check_command                  check_nt!MEMUSE!-w 80 -c 90
        register                           1
}
注：内存使用到达80%则warn，到达90%则Critical

监控 windows 主机的开机运作时间
define service {
        host_name                       winxp226
        service_description                Uptime
        use                              generic-service
        check_command                  check_nt!UPTIME
        register                          1
}

检查windows主机是否已经安装了NSClient++，及它的版本号
define service {
        host_name                       winxp226
        service_description                NSClient++ Version
        use                              generic-service
        check_command                  check_nt!CLIENTVERSION
        register                           1
}

监控 windows 主机的 C:\ 的空间使用量
define service {
        host_name                     winxp226
        service_description              C:\ Drive Space
        use                            generic-service
        check_command                check_nt!USEDDISKSPACE!-l c! -w 80 -c 90
        register                         1
}

监控 windows主机的W3SVC设置的动作状况
define service{
     host_name           winxp226     use                 generic-service     service_description    W3SVC     check_command      check_nt!SERVICESTATE!-d SHOWALL -l W3SVC }

监控 windows 主机的 Explorer.exe 进程运作状况，如程序终止，则会发 Critical
define service {
      host_name              winxp226
      service_description       Explorer
      use                     generic-service
      check_command         check_nt!PROCSTATE! -d SHOWALL -l explorer.exe
      register                  1
}

监控 windows 主机的SNMP服务的运作状况，如服务终止，则会发CRITICAL
define service{use                      generic-servicehost_name               winxp226service_description       SNMPcheck_command         check_nt!SERVICESTATE!-d SHOWALL -l "SNMP Service" }

9）监控Windows主机的MySQL服务运行情况，如服务终止，则会发出CRITICAL
     define service {
        host_name               winxp226
        service_description       MySQL55
        use                      generic-service
        check_command         check_nt!SERVICESTATE! -d SHOWALL -l MySQL55
        register                  1
      }
注意：此服务名称应与Windows服务名称相同，如服务中间有空格时请将其放入双引号内，否则将会报无效的参数。如下图

检查nagios.cfg文件是否有误，然后重启nagios
[root@localhost services]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
[root@localhost services]# service nagios restart

监控效果图