MySQL高可用方案 MHA之一MHA安装

MHA0.58安装

MHA（Master High Availability）由两部分组成：MHA Manager（管理节点）和MHA Node（数据节点）。
管理节点
mha4mysql-manager-0.58
mha4mysql-manager-0.58
下载地址：

wget https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58.tar.gz
wget https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58.tar.gz

MHA Manager

可以单独部署在一台独立的机器上管理多个master-slave集群，也可以部署在一台slave节点上。MHA Manager会定时探测集群中的master节点，当master出现故障时，
它可以自动将最新数据的slave提升为新的master，然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。

MHA Node
运行在每台MySQL服务器上，定时和 MHA Manager交互信息。

在MHA自动故障切换过程中，MHA试图从宕机的主服务器上保存二进制日志，最大程度的保证数据的不丢失，但这并不总是可行的。
例如，如果主服务器硬件故障或无法通过ssh访问，MHA没法保存二进制日志，只进行故障转移而丢失了最新的数据。

使用MySQL 的半同步复制，可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志，
MHA可以将最新的二进制日志应用于其他所有的slave服务器上，因此可以保证所有节点的数据一致性。

搭建MHA,要求一个复制集群中必须最少有三台数据库服务器，一主二从，即一台充当master，一台充当备用master，另外一台充当从库。

主从架构
master:
10.150.20.90 ed3jrdba90
slave:
10.15.20.97 ed3jrdba97
10.150.20.132 ed3jrdba132
manager:
10.150.20.95 ed3jrdba95

四台机器的系统情况：
OS:CentOS7.3
MySQL:5.7.21
MHA:0.58
网卡名：ens3

1：MHA的各节点os配置ssh免密登录
测试
[root@qa05 ~]#ssh ed3jrdba90
[root@qa05 ~]#ssh ed3jrdba97
[root@qa05 ~]#ssh ed3jrdba132

2：在master-slave节点按照统一规范安装MySQL数据库

3：搭建主从复制一主两从

每个节点的 binlog-do-db 和 replicate-ignore-db 设置必须相同

记录初始化MySQL实例root临时密码，记录在错误日志，登录交互界面进行修改
master修改root@localhost密码
root@(none) 03:15:25>set password for 'root'@'localhost'= password('root');

master 创建监控用户
mysql> create user 'mha_monitor'@'10.150.20.%' identified by 'mha_monitor';
mysql> grant all privileges on *.* to 'mha_monitor'@'10.150.20.%';
mysql> flush privileges;

master创建复制用户
mysql> create user 'replicator'@'10.150.20.%' identified by 'replicator';
mysql> grant replication slave on *.* to 'replicator'@'10.150.20.%';
mysql> flush privileges;

master备库数据库实例，slave同步数据
master:
[root@qa05 ~]# mysqldump -uroot -p --master-data=2 --single-transaction -R --triggers -A > /data/mysql_all.sql
slave:
[root@qa06 ~]# mysql -uroot -p < /data/mysql_all.sql
[root@cgdb ~]#mysql -uroot -p < /data/mysql_all.sql

salve设置为 read only，从库对外提供读服务，不写进配置文件，因为随时slave会提升为master。
mysql> set global read_only=1;

4：安装 mha node
在ed3jrdba90、ed3jrdba95、ed3jrdba97、ed3jrdba132安装mha4mysql-node-0.58

# cd mha4mysql-node-0.58
# perl Makefile.PL
# make 
# make install

# ll /usr/local/bin/

-rwxr-xr-x 1 root root 16377 12月 26 2017 apply_diff_relay_logs
-rwxr-xr-x 1 root root 4803 12月 26 2017 filter_mysqlbinlog
-rwxr-xr-x 1 root root 8257 12月 26 2017 purge_relay_logs
-rwxr-xr-x 1 root root 7521 12月 26 2017 save_binary_log

备注：
apply_diff_relay_logs识别差异的中继日志事件并将其差异的事件应用到其他的slave
filter_mysqlbinlog去除不必要的rollback事件（MHA已不再使用这个工具）
purge_relay_logs清楚中继日志（不会阻塞SQL线程）
save_binary_log保存和复制master的二进制日志

5：安装 mha manager

在ed3jrdba95安装mha4mysql-manager-0.58
# cd mha4mysql-manager-0.58
# perl Makefile.PL
# make
# make install

[root@dev05 ~]# ll /usr/local/bin/
60

-r-xr-xr-x 1 root root 1995 110 11:32 masterha_check_repl
-r-xr-xr-x 1 root root 1779 110 11:32 masterha_check_ssh
-r-xr-xr-x 1 root root 1865 110 11:32 masterha_check_status
-r-xr-xr-x 1 root root 3201 110 11:32 masterha_conf_host
-r-xr-xr-x 1 root root 2517 110 11:32 masterha_manager
-r-xr-xr-x 1 root root 2165 110 11:32 masterha_master_monitor
-r-xr-xr-x 1 root root 2373 110 11:32 masterha_master_switch
-r-xr-xr-x 1 root root 5172 110 11:32 masterha_secondary_check
-r-xr-xr-x 1 root root 1739 110 11:32 masterha_stop

备注：

masterha_check_repl检查MySQL复制状况
masterha_check_ssh检查MHA的SSH配置状况
masterha_check_status检测当前MHA的运行状态
masterha_conf_host添加和删除配置的server信息
masterha_manager启动MHA
masterha_master_monitor检测master是否宕机
masterha_master_switch控制故障转移（自动、手动）
masterha_secondary_check
masterha_stop

cp /opt/source/mha4mysql-manager-0.58/samples/scripts/* /usr/local/bin/

-rwxr-xr-x 1 dev_bj dev_bj 3648 323 2018 master_ip_failover
-rwxr-xr-x 1 dev_bj dev_bj 9870 323 2018 master_ip_online_change
-rwxr-xr-x 1 dev_bj dev_bj 11867 323 2018 power_manager
-rwxr-xr-x 1 dev_bj dev_bj 1360 323 2018 send_report
---------------------
备注
master_ip_failover 自动切换时vip管理的脚本，不是必须
master_ip_online_change 在线切换时vip的管理，不是必须
power_manager 故障发生后关闭主机的脚本，不是必须
send_report 因故障切换后发送报警的脚本，不是必须
---------------------

6：MHA配置

创建mha工作目录

# mkdir -p /etc/mysql_mha/
# cp /opt/source/mha4mysql-manager-0.58/samples/conf/* /etc/masterha/

编辑配置app1.cnf文件：
vi /etc/mysql_mha/app1.cnf
[server default]
#manager工作目录
manager_workdir=/data/mysql_mha/app1
#manager日志路径
manager_log=/data/mysql_mha/app1-manager.log
#master保存binlog的位置，以便MHA可以找到master的日志
master_binlog_dir=/data/mysql_33061/logs
#自动failover时候的切换脚本
#master_ip_failover_script=/usr/local/bin/master_ip_failover
#手动切换时候的切换脚本
#master_ip_online_change_script= /usr/local/bin/master_ip_online_change
#master创建的监控用户
user=mha_monitor
#MySQL创建的监控用户密码
password=mha_monitor
#设置监控主库，发送ping包的时间间隔，默认为3秒，尝试三次没有回应的时候自动进行failover
ping_interval=5
#远端MySQL在发生切换时，binlog的保存位置
remote_workdir=/data/mysql_mha/app1
#MySQL创建的复制用户密码
repl_password=replicator
repl_user=replicator
#发生切换后发送的报警的脚本
#report_script=/usr/local/send_report

#MHA到ed3jrdba90之间的监控异常，MHA manager将会尝试从ed3jrdba97登录到ed3jrdba90
secondary_check_script=/usr/local/bin/masterha_secondary_check -s ed3jrdba97 -s ed3jrdba90
#故障发生后关闭故障主机脚本
shutdown_script=""
#ssh的登录用户名，需事先配置好ssh免密登录
ssh_user=root

#master信息
[server1]
hostname=10.150.20.90
port=33061

#candidate master信息
[server2]
hostname=10.150.20.97
port=33061
#候选master，如果设置该参数后，发生主从切换时，将该库提升为主库，即使这个主库不是集群中事件最新的SLAVE
candidate_master=1
#默认情况下如果一个slave落后master 100M的relay logs的话，MHA将不会选择该slave作为一个新的master，
#因为对于这个slave的恢复需要花费很长时间，通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时，
#这个参数对于设置了candidate_master=1的主机非常有用，因为这个候选主在切换的过程中一定是新的master
check_repl_delay=0

#slave 信息
[server3]
hostname=10.150.20.132
port=33061

全局配置文件编辑：

# cat /etc/mysql_mha/masterha_default.cnf
[server default]
user=mha_monitor
password=mha_monitor
ssh_user=root

# for replication

repl_user=replicator
repl_password=replicator

# for monitor
ping_interval=1
shutdown_script=""
#master_ip_failover_script=/usr/local/bin/master_ip_failover_script

#remote_workdir=/data/log/masterha
#secondary_check_script= masterha_secondary_check -s remote_host1 -s remote_host2
#ping_interval=1
# master_ip_failover_script= /script/masterha/master_ip_failover
# shutdown_script= /script/masterha/power_manager
# report_script= /script/masterha/send_report
# master_ip_online_change_script= /script/masterha/master_ip_online_change

relay log的设置

在每个slave上执行,将relay log的自动设置清除为OFF

mysql> set global relay_log_purge=0;

# vi /usr/local/bin/purge_relay_log.sh

 1 #!/bin/bash
 2 user=root
 3 passwd=root
 4 port=33061
 5 log_dir='/data/mysql_33061/logs'
 6 work_dir='/data/mysql_33061'
 7 purge='/usr/local/bin/purge_relay_logs'
 8 
 9 if [ ! -d $log_dir ]
10 then
11 mkdir $log_dir -p
12 fi
13 
14 $purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs1.log 2>&1

purge_relay_log.sh

检查MHA Manger到所有MHA Node的SSH连接

# masterha_check_ssh --conf=/etc/mysql_mha/app1.cnf
Mon Dec 10 16:39:49 2018 - [info] Reading default configuration from /etc/masterha_default.cnf..
Mon Dec 10 16:39:49 2018 - [info] Reading application default configuration from /etc/mysql_mha/app1.cnf..
Mon Dec 10 16:39:49 2018 - [info] Reading server configuration from /etc/mysql_mha/app1.cnf..
Mon Dec 10 16:39:49 2018 - [info] Starting SSH connection tests..
Mon Dec 10 16:39:50 2018 - [debug]
Mon Dec 10 16:39:49 2018 - [debug] Connecting via SSH from root@10.150.20.90(10.150.20.90:22) to root@10.150.20.97(10.150.20.97:22)..
Mon Dec 10 16:39:49 2018 - [debug] ok.
Mon Dec 10 16:39:49 2018 - [debug] Connecting via SSH from root@10.150.20.90(10.150.20.90:22) to root@10.150.20.132(10.150.20.132:22)..
Mon Dec 10 16:39:50 2018 - [debug] ok.
Mon Dec 10 16:39:51 2018 - [debug]
Mon Dec 10 16:39:50 2018 - [debug] Connecting via SSH from root@10.150.20.97(10.150.20.97:22) to root@10.150.20.90(10.150.20.90:22)..
Mon Dec 10 16:39:50 2018 - [debug] ok.
Mon Dec 10 16:39:50 2018 - [debug] Connecting via SSH from root@10.150.20.97(10.150.20.97:22) to root@10.150.20.132(10.150.20.132:22)..
Mon Dec 10 16:39:50 2018 - [debug] ok.
Mon Dec 10 16:39:52 2018 - [debug]
Mon Dec 10 16:39:50 2018 - [debug] Connecting via SSH from root@10.150.20.132(10.150.20.132:22) to root@10.150.20.90(10.150.20.90:22)..
Mon Dec 10 16:39:51 2018 - [debug] ok.
Mon Dec 10 16:39:51 2018 - [debug] Connecting via SSH from root@10.150.20.132(10.150.20.132:22) to root@10.150.20.97(10.150.20.97:22)..
Mon Dec 10 16:39:51 2018 - [debug] ok.
Mon Dec 10 16:39:52 2018 - [info] All SSH connection tests passed successfully.

检查整个复制环境

# vi /etc/mysql_mha/app1.cnf
#先注销掉这几个参数
#master_ip_failover_script= /usr/local/bin/master_ip_failover
#master_ip_online_change_script= /usr/local/bin/master_ip_online_change
#report_script=/usr/local/send_report

/usr/local/bin/masterha_check_repl --conf=/etc/mysql_mha/app1.cnf

/usr/local/bin/masterha_check_repl --conf=/etc/mysql_mha/app1.cnf
Wed Dec 12 11:13:53 2018 - [info] Reading default configuration from /etc/masterha_default.cnf..
Wed Dec 12 11:13:53 2018 - [info] Reading application default configuration from /etc/mysql_mha/app1.cnf..
Wed Dec 12 11:13:53 2018 - [info] Reading server configuration from /etc/mysql_mha/app1.cnf..
Wed Dec 12 11:13:53 2018 - [info] MHA::MasterMonitor version 0.58.
Wed Dec 12 11:13:54 2018 - [info] GTID failover mode = 0
Wed Dec 12 11:13:54 2018 - [info] Dead Servers:
Wed Dec 12 11:13:54 2018 - [info] Alive Servers:
Wed Dec 12 11:13:54 2018 - [info]   10.150.20.90(10.150.20.90:33061)
Wed Dec 12 11:13:54 2018 - [info]   10.150.20.97(10.150.20.97:33061)
Wed Dec 12 11:13:54 2018 - [info]   10.150.20.132(10.150.20.132:33061)
Wed Dec 12 11:13:54 2018 - [info] Alive Slaves:
Wed Dec 12 11:13:54 2018 - [info]   10.150.20.97(10.150.20.97:33061)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Wed Dec 12 11:13:54 2018 - [info]     Replicating from 10.150.20.90(10.150.20.90:33061)
Wed Dec 12 11:13:54 2018 - [info]   10.150.20.132(10.150.20.132:33061)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Wed Dec 12 11:13:54 2018 - [info]     Replicating from 10.150.20.90(10.150.20.90:33061)
Wed Dec 12 11:13:54 2018 - [info] Current Alive Master: 10.150.20.90(10.150.20.90:33061)
Wed Dec 12 11:13:54 2018 - [info] Checking slave configurations..
Wed Dec 12 11:13:54 2018 - [info]  read_only=1 is not set on slave 10.150.20.97(10.150.20.97:33061).
Wed Dec 12 11:13:54 2018 - [warning]  relay_log_purge=0 is not set on slave 10.150.20.97(10.150.20.97:33061).
Wed Dec 12 11:13:54 2018 - [info]  read_only=1 is not set on slave 10.150.20.132(10.150.20.132:33061).
Wed Dec 12 11:13:54 2018 - [info] Checking replication filtering settings..
Wed Dec 12 11:13:54 2018 - [info]  binlog_do_db= , binlog_ignore_db= 
Wed Dec 12 11:13:54 2018 - [info]  Replication filtering check ok.
Wed Dec 12 11:13:54 2018 - [info] GTID (with auto-pos) is not supported
Wed Dec 12 11:13:54 2018 - [info] Starting SSH connection tests..
Wed Dec 12 11:13:57 2018 - [info] All SSH connection tests passed successfully.
Wed Dec 12 11:13:57 2018 - [info] Checking MHA Node version..
Wed Dec 12 11:13:58 2018 - [info]  Version check ok.
Wed Dec 12 11:13:58 2018 - [info] Checking SSH publickey authentication settings on the current master..
Wed Dec 12 11:13:58 2018 - [info] HealthCheck: SSH to 10.150.20.90 is reachable.
Wed Dec 12 11:13:58 2018 - [info] Master MHA Node version is 0.58.
Wed Dec 12 11:13:58 2018 - [info] Checking recovery script configurations on 10.150.20.90(10.150.20.90:33061)..
Wed Dec 12 11:13:58 2018 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000006 
Wed Dec 12 11:13:58 2018 - [info]   Connecting to root@10.150.20.90(10.150.20.90:22).. 
  Creating /data/mysql_mha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /data/mysql_33061/logs, up to mysql-bin.000006
Wed Dec 12 11:13:58 2018 - [info] Binlog setting check done.
Wed Dec 12 11:13:58 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Dec 12 11:13:58 2018 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha_monitor' --slave_host=10.150.20.97 --slave_ip=10.150.20.97 --slave_port=33061 --workdir=/data/mysql_mha/app1 --target_version=5.7.21-log --manager_version=0.58 --relay_log_info=/data/mysql_33061/logs/relay-log.info  --relay_dir=/data/mysql_33061/data/  --slave_pass=xxx
Wed Dec 12 11:13:58 2018 - [info]   Connecting to root@10.150.20.97(10.150.20.97:22).. 
  Checking slave recovery environment settings..
    Opening /data/mysql_33061/logs/relay-log.info ... ok.
    Relay log found at /data/mysql_33061/logs, up to relaylog.000002
    Temporary relay log file is /data/mysql_33061/logs/relaylog.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Dec 12 11:13:58 2018 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha_monitor' --slave_host=10.150.20.132 --slave_ip=10.150.20.132 --slave_port=33061 --workdir=/data/mysql_mha/app1 --target_version=5.7.21-log --manager_version=0.58 --relay_log_info=/data/mysql_33061/logs/relay-log.info  --relay_dir=/data/mysql_33061/data/  --slave_pass=xxx
Wed Dec 12 11:13:58 2018 - [info]   Connecting to root@10.150.20.132(10.150.20.132:22).. 
  Checking slave recovery environment settings..
    Opening /data/mysql_33061/logs/relay-log.info ... ok.
    Relay log found at /data/mysql_33061/data, up to cgdb-relay-bin.000002
    Temporary relay log file is /data/mysql_33061/data/cgdb-relay-bin.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Dec 12 11:13:59 2018 - [info] Slaves settings check done.
Wed Dec 12 11:13:59 2018 - [info] 
10.150.20.90(10.150.20.90:33061) (current master)
 +--10.150.20.97(10.150.20.97:33061)
 +--10.150.20.132(10.150.20.132:33061)

Wed Dec 12 11:13:59 2018 - [info] Checking replication health on 10.150.20.97..
Wed Dec 12 11:13:59 2018 - [info]  ok.
Wed Dec 12 11:13:59 2018 - [info] Checking replication health on 10.150.20.132..
Wed Dec 12 11:13:59 2018 - [info]  ok.
Wed Dec 12 11:13:59 2018 - [info] Checking master_ip_failover_script status:
Wed Dec 12 11:13:59 2018 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.150.20.90 --orig_master_ip=10.150.20.90 --orig_master_port=33061 
Bareword "FIXME_xxx" not allowed while "strict subs" in use at /usr/local/bin/master_ip_failover line 93.
Execution of /usr/local/bin/master_ip_failover aborted due to compilation errors.
Wed Dec 12 11:13:59 2018 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln229]  Failed to get master_ip_failover_script status with return code 255:0.
Wed Dec 12 11:13:59 2018 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations.  at /usr/local/bin/masterha_check_repl line 48.
Wed Dec 12 11:13:59 2018 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Wed Dec 12 11:13:59 2018 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

报错信息

编辑master_ip_failover

注释掉：# FIXME_xxx

再次，测试#masterha_check_repl --conf=/etc/mysql_mha/app1.cnf

# masterha_check_repl --conf=/etc/mysql_mha/app1.cnf
Wed Dec 12 11:23:49 2018 - [info] Reading default configuration from /etc/masterha_default.cnf..
Wed Dec 12 11:23:49 2018 - [info] Reading application default configuration from /etc/mysql_mha/app1.cnf..
Wed Dec 12 11:23:49 2018 - [info] Reading server configuration from /etc/mysql_mha/app1.cnf..
Wed Dec 12 11:23:49 2018 - [info] MHA::MasterMonitor version 0.58.
Wed Dec 12 11:23:50 2018 - [info] GTID failover mode = 0
Wed Dec 12 11:23:50 2018 - [info] Dead Servers:
Wed Dec 12 11:23:50 2018 - [info] Alive Servers:
Wed Dec 12 11:23:50 2018 - [info]   10.150.20.90(10.150.20.90:33061)
Wed Dec 12 11:23:50 2018 - [info]   10.150.20.97(10.150.20.97:33061)
Wed Dec 12 11:23:50 2018 - [info]   10.150.20.132(10.150.20.132:33061)
Wed Dec 12 11:23:50 2018 - [info] Alive Slaves:
Wed Dec 12 11:23:50 2018 - [info]   10.150.20.97(10.150.20.97:33061)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Wed Dec 12 11:23:50 2018 - [info]     Replicating from 10.150.20.90(10.150.20.90:33061)
Wed Dec 12 11:23:50 2018 - [info]   10.150.20.132(10.150.20.132:33061)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Wed Dec 12 11:23:50 2018 - [info]     Replicating from 10.150.20.90(10.150.20.90:33061)
Wed Dec 12 11:23:50 2018 - [info] Current Alive Master: 10.150.20.90(10.150.20.90:33061)
Wed Dec 12 11:23:50 2018 - [info] Checking slave configurations..
Wed Dec 12 11:23:50 2018 - [info]  read_only=1 is not set on slave 10.150.20.97(10.150.20.97:33061).
Wed Dec 12 11:23:50 2018 - [warning]  relay_log_purge=0 is not set on slave 10.150.20.97(10.150.20.97:33061).
Wed Dec 12 11:23:50 2018 - [info]  read_only=1 is not set on slave 10.150.20.132(10.150.20.132:33061).
Wed Dec 12 11:23:50 2018 - [info] Checking replication filtering settings..
Wed Dec 12 11:23:50 2018 - [info]  binlog_do_db= , binlog_ignore_db= 
Wed Dec 12 11:23:50 2018 - [info]  Replication filtering check ok.
Wed Dec 12 11:23:50 2018 - [info] GTID (with auto-pos) is not supported
Wed Dec 12 11:23:50 2018 - [info] Starting SSH connection tests..
Wed Dec 12 11:23:53 2018 - [info] All SSH connection tests passed successfully.
Wed Dec 12 11:23:53 2018 - [info] Checking MHA Node version..
Wed Dec 12 11:23:53 2018 - [info]  Version check ok.
Wed Dec 12 11:23:53 2018 - [info] Checking SSH publickey authentication settings on the current master..
Wed Dec 12 11:23:54 2018 - [info] HealthCheck: SSH to 10.150.20.90 is reachable.
Wed Dec 12 11:23:54 2018 - [info] Master MHA Node version is 0.58.
Wed Dec 12 11:23:54 2018 - [info] Checking recovery script configurations on 10.150.20.90(10.150.20.90:33061)..
Wed Dec 12 11:23:54 2018 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000006 
Wed Dec 12 11:23:54 2018 - [info]   Connecting to root@10.150.20.90(10.150.20.90:22).. 
  Creating /data/mysql_mha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /data/mysql_33061/logs, up to mysql-bin.000006
Wed Dec 12 11:23:54 2018 - [info] Binlog setting check done.
Wed Dec 12 11:23:54 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Dec 12 11:23:54 2018 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha_monitor' --slave_host=10.150.20.97 --slave_ip=10.150.20.97 --slave_port=33061 --workdir=/data/mysql_mha/app1 --target_version=5.7.21-log --manager_version=0.58 --relay_log_info=/data/mysql_33061/logs/relay-log.info  --relay_dir=/data/mysql_33061/data/  --slave_pass=xxx
Wed Dec 12 11:23:54 2018 - [info]   Connecting to root@10.150.20.97(10.150.20.97:22).. 
  Checking slave recovery environment settings..
    Opening /data/mysql_33061/logs/relay-log.info ... ok.
    Relay log found at /data/mysql_33061/logs, up to relaylog.000002
    Temporary relay log file is /data/mysql_33061/logs/relaylog.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Dec 12 11:23:54 2018 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha_monitor' --slave_host=10.150.20.132 --slave_ip=10.150.20.132 --slave_port=33061 --workdir=/data/mysql_mha/app1 --target_version=5.7.21-log --manager_version=0.58 --relay_log_info=/data/mysql_33061/logs/relay-log.info  --relay_dir=/data/mysql_33061/data/  --slave_pass=xxx
Wed Dec 12 11:23:54 2018 - [info]   Connecting to root@10.150.20.132(10.150.20.132:22).. 
  Checking slave recovery environment settings..
    Opening /data/mysql_33061/logs/relay-log.info ... ok.
    Relay log found at /data/mysql_33061/data, up to cgdb-relay-bin.000002
    Temporary relay log file is /data/mysql_33061/data/cgdb-relay-bin.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Dec 12 11:23:55 2018 - [info] Slaves settings check done.
Wed Dec 12 11:23:55 2018 - [info] 
10.150.20.90(10.150.20.90:33061) (current master)
 +--10.150.20.97(10.150.20.97:33061)
 +--10.150.20.132(10.150.20.132:33061)

Wed Dec 12 11:23:55 2018 - [info] Checking replication health on 10.150.20.97..
Wed Dec 12 11:23:55 2018 - [info]  ok.
Wed Dec 12 11:23:55 2018 - [info] Checking replication health on 10.150.20.132..
Wed Dec 12 11:23:55 2018 - [info]  ok.
Wed Dec 12 11:23:55 2018 - [info] Checking master_ip_failover_script status:
Wed Dec 12 11:23:55 2018 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.150.20.90 --orig_master_ip=10.150.20.90 --orig_master_port=33061 
Wed Dec 12 11:23:55 2018 - [info]  OK.
Wed Dec 12 11:23:55 2018 - [warning] shutdown_script is not defined.
Wed Dec 12 11:23:55 2018 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

测试通过

启动 mha manager

#nohup masterha_manager --conf=/etc/mysql_mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mysql_mha/app1-manager.log 2>&1 &

查看manater status
# masterha_check_status --conf=/etc/mysql_mha/app1.cnf
app1 (pid:16203) is running(0:PING_OK), master:10.150.20.90

观察manager log
# tail -100f app1-manager.log

Mon Dec 10 16:54:02 2018 - [info] Reading default configuration from /etc/masterha_default.cnf..
Mon Dec 10 16:54:02 2018 - [info] Reading application default configuration from /etc/mysql_mha/app1.cnf..
Mon Dec 10 16:54:02 2018 - [info] Reading server configuration from /etc/mysql_mha/app1.cnf..
Mon Dec 10 16:54:02 2018 - [info] MHA::MasterMonitor version 0.58.
Mon Dec 10 16:54:03 2018 - [info] GTID failover mode = 0
Mon Dec 10 16:54:03 2018 - [info] Dead Servers:
Mon Dec 10 16:54:03 2018 - [info] Alive Servers:
Mon Dec 10 16:54:03 2018 - [info] 10.150.20.90(10.150.20.90:33061)
Mon Dec 10 16:54:03 2018 - [info] 10.150.20.97(10.150.20.97:33061)
Mon Dec 10 16:54:03 2018 - [info] 10.150.20.132(10.150.20.132:33061)
Mon Dec 10 16:54:03 2018 - [info] Alive Slaves:
Mon Dec 10 16:54:03 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Mon Dec 10 16:54:03 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
Mon Dec 10 16:54:03 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Dec 10 16:54:03 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Mon Dec 10 16:54:03 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
Mon Dec 10 16:54:03 2018 - [info] Current Alive Master: 10.150.20.90(10.150.20.90:33061)
Mon Dec 10 16:54:03 2018 - [info] Checking slave configurations..
Mon Dec 10 16:54:03 2018 - [info] Checking replication filtering settings..
Mon Dec 10 16:54:03 2018 - [info] binlog_do_db= , binlog_ignore_db= 
Mon Dec 10 16:54:03 2018 - [info] Replication filtering check ok.
Mon Dec 10 16:54:03 2018 - [info] GTID (with auto-pos) is not supported
Mon Dec 10 16:54:03 2018 - [info] Starting SSH connection tests..
Mon Dec 10 16:54:06 2018 - [info] All SSH connection tests passed successfully.
Mon Dec 10 16:54:06 2018 - [info] Checking MHA Node version..
Mon Dec 10 16:54:07 2018 - [info] Version check ok.
Mon Dec 10 16:54:07 2018 - [info] Checking SSH publickey authentication settings on the current master..
Mon Dec 10 16:54:07 2018 - [info] HealthCheck: SSH to 10.150.20.90 is reachable.
Mon Dec 10 16:54:07 2018 - [info] Master MHA Node version is 0.58.
Mon Dec 10 16:54:07 2018 - [info] Checking recovery script configurations on 10.150.20.90(10.150.20.90:33061)..
Mon Dec 10 16:54:07 2018 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000005 
Mon Dec 10 16:54:07 2018 - [info] Connecting to root@10.150.20.90(10.150.20.90:22).. 
Creating /data/mysql_mha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mysql_33061/logs, up to mysql-bin.000005
Mon Dec 10 16:54:07 2018 - [info] Binlog setting check done.
Mon Dec 10 16:54:07 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Mon Dec 10 16:54:07 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha_monitor' --slave_host=10.150.20.97 --slave_ip=10.150.20.97 --slave_port=33061 --workdir=/data/mysql_mha/app1 --target_version=5.7.21-log --manager_version=0.58 --relay_log_info=/data/mysql_33061/logs/relay-log.info --relay_dir=/data/mysql_33061/data/ --slave_pass=xxx
Mon Dec 10 16:54:07 2018 - [info] Connecting to root@10.150.20.97(10.150.20.97:22).. 
Checking slave recovery environment settings..
Opening /data/mysql_33061/logs/relay-log.info ... ok.
Relay log found at /data/mysql_33061/logs, up to relaylog.000002
Temporary relay log file is /data/mysql_33061/logs/relaylog.000002
Checking if super_read_only is defined and turned on..Disabling super_read_only, enabling read_only, so that the applying can be done on the slave
Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Enabling super_read_only again..Enabling super_read_only again after applying
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Mon Dec 10 16:54:07 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha_monitor' --slave_host=10.150.20.132 --slave_ip=10.150.20.132 --slave_port=33061 --workdir=/data/mysql_mha/app1 --target_version=5.7.21-log --manager_version=0.58 --relay_log_info=/data/mysql_33061/logs/relay-log.info --relay_dir=/data/mysql_33061/data/ --slave_pass=xxx
Mon Dec 10 16:54:07 2018 - [info] Connecting to root@10.150.20.132(10.150.20.132:22).. 
Checking slave recovery environment settings..
Opening /data/mysql_33061/logs/relay-log.info ... ok.
Relay log found at /data/mysql_33061/data, up to cgdb-relay-bin.000002
Temporary relay log file is /data/mysql_33061/data/cgdb-relay-bin.000002
Checking if super_read_only is defined and turned on..Disabling super_read_only, enabling read_only, so that the applying can be done on the slave
Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Enabling super_read_only again..Enabling super_read_only again after applying
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Mon Dec 10 16:54:08 2018 - [info] Slaves settings check done.
Mon Dec 10 16:54:08 2018 - [info] 
10.150.20.90(10.150.20.90:33061) (current master)
+--10.150.20.97(10.150.20.97:33061)
+--10.150.20.132(10.150.20.132:33061)

Mon Dec 10 16:54:08 2018 - [info] Checking master_ip_failover_script status:
Mon Dec 10 16:54:08 2018 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.150.20.90 --orig_master_ip=10.150.20.90 --orig_master_port=33061


IN SCRIPT TEST====systemctl stop keepalived==systemctl start keepalived===

Checking the Status of the script.. OK 
Mon Dec 10 16:54:08 2018 - [info] OK.
Mon Dec 10 16:54:08 2018 - [warning] shutdown_script is not defined.
Mon Dec 10 16:54:08 2018 - [info] Set master ping interval 5 seconds.
Mon Dec 10 16:54:08 2018 - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s ed3jrdba97 -s ed3jrdba90
Mon Dec 10 16:54:08 2018 - [info] Starting ping health check on 10.150.20.90(10.150.20.90:33061)..
Mon Dec 10 16:54:08 2018 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

manager.log

关闭 mha manager

# masterha_stop --conf=/etc/mysql_mha/app1.cnf
Stopped app1 successfully.

参考：

https://blog.csdn.net/ctypyb2002/article/details/81478098

MHA实战篇：https://yq.aliyun.com/articles/58920