ZK进程监控

在网上找到如下方案,监控 zk 的进程,如果进程不在,就重启 zk。
有种情况解决不了:当 zk 僵死的时候,进程还在,但是存在很多 CLOSE_WAIT 的 tcp 连接,导致 zk 连接不上!

#!/bin/sh
 
while true;
do
    time1=$(date)
    echo $time1
    count=`ps -ef|grep zookeeper | grep -v grep`
    if [ "$?" != "0" ];then
        echo  ">>>>zookeeper has shutdown"
        echo  ">>>>restart zookeeper now !"
        sh zkServer.sh start
    else
        echo ">>>>zookeeper is runing..."
    fi
    sleep 60
done
View Code

zk 僵死的时候,发送 sh zkServer.sh status 时,会返回一个错误的字符串,如果是正常的,就会返回 Mode: leader 或者 Mode: follower。
改进的监控程序如下:
monitorzk.sh

 1 #!/bin/sh
 2 
 3 while true;
 4 do
 5     time1=$(date)
 6     echo $time1
 7     t=`sh zkServer.sh status`
 8     if [[ $t == Mode* ]];then
 9         echo ">>>>zookeeper is runing..."
10     else
11         echo  ">>>>zookeeper has shutdown"
12         echo  ">>>>restart zookeeper now !"
13         kill -9 $(cat "/usr/local/zookeeper-3.4.6/data/zookeeper_server.pid")
14         sh zkServer.sh start
15     fi
16     sleep 60
17 done

startMonitor.sh

nohup sh monitorzk.sh >> monitor.log 2>&1 &
原文地址:https://www.cnblogs.com/kevin-yuan/p/14039996.html