基于 MHA 的 MySQL 高可用方案

一、工作流程
(1)从宕机崩溃的 master 上尝试保存二进制日志事件(binlog events);
(2)识别含有最新更新的 slave 服务器;
(3)应用差异的中继日志(relay log)到其他的 slave;
(4)应用从 master 保存的二进制日志事件(binlog events);
(5)提升一个 slave 为新的 master 服务器;
(6)将其他的 slave 连接指向新的 master 进行主从复制;

二、实验环境

1、注意安装环境,安装工具软件包net-tools

2、

主机名 IP地址 角色 serverID 数据库类型
server01 192.168.200.111 primary master1 1 写入
server02 192.168.200.112 secondary master2 2 写入
server03 192.168.200.113 slave1 3 读取
server04 192.168.200.114 slave2 4 读取
server05 192.168.200.115 manager   监控复制组

    其中primary master对外提供写服务,备选secondary master实际相当于slave,提供读取服务,salve1和slave2也提供相关读服务,一旦primary master宕机,将会把备选secondary master提升为新的primary master,slave1和slave2指向新的master。

三、前期环境部署

1、配置所有主机名称

master1 主机:
hostname server01
bash

master2 主机:
hostname server02
bash

slave1 主机:
hostname server03
bash

slave2 主机:
hostname server04
bash

manager 主机:
hostname server05
bash

2、配置所有主机的主机名与IP地址的映射关系

vim /etc/hosts

192.168.200.111 server01
192.168.200.112 server02
192.168.200.113 server03
192.168.200.114 server04
192.168.200.115 server05

3、所有主机关闭防火墙与linux安全机制

systemctl stop firewalld 
iptables -F
setenforce 0

4、安装在线yum源与epel源(epel-release)

[root@server01 yum.repos.d]# ls
a  CentOS7-Base-163_(1).repo
[root@server01 yum.repos.d]# yum -y install epel-release

[root@server01 yum.repos.d]# ls
a  CentOS7-Base-163_(1).repo  epel.repo  epel-testing.repo
修改配置文件
[root@server01 yum.repos.d]# vim epel.repo

rpm -ivh epel-release-latest-7.noarch.rpm

四、所有主机安装MHA node

1、安装所支持的软件及相关的perl依赖包

yum install -y perl-DBD-MySQL.x86_64 perl-DBI.x86_64 perl-CPAN perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker

------------------------------------------------------------------------------
查看是否安装成功
rpm -q perl-DBD-MySQL.x86_64 perl-DBI.x86_64 perl-CPAN perl-ExtUtils-CBuilder
安装成功反馈结果
perl-DBD-MySQL-4.023-6.el7.x86_64
perl-DBI-1.627-4.el7.x86_64
perl-CPAN-1.9800-294.el7_6.noarch
perl-ExtUtils-CBuilder-0.28.2.6-294.el7_6.noarch

2、上传、解压、安装node

tar xf mha4mysql-node-0.56.tar.gz
cd mha4mysql-node-0.56/
perl Makefile.PL
make && make install

3、安装成功后,在/usr/local/bin下生成四个脚本

ls -l /usr/local/bin/
总用量 40
-r-xr-xr-x. 1 root root 16346 10月 22 14:42 apply_diff_relay_logs
-r-xr-xr-x. 1 root root  4807 10月 22 14:42 filter_mysqlbinlog
-r-xr-xr-x. 1 root root  7401 10月 22 14:42 purge_relay_logs
-r-xr-xr-x. 1 root root  7395 10月 22 14:42 save_binary_logs

五、在manager(192.168.200.115)主机上安装MHA Manager

1、安装依赖软件

yum install -y perl perl-Log-Dispatch perl-Parallel-ForkManager perl-DBD-MySQL perl-DBI perl-Time-HiRes

--------------------------------------------------------
yum -y install perl-Config-Tiny-2.14-7.el7.noarch.rpm

--------------------------------------------------------
检查是否安装成功
rpm -q perl  perl-Log-Dispatch perl-Parallel-ForkManager perl-DBD-MySQL perl-DBI

perl
-5.16.3-294.el7_6.x86_64 perl-Log-Dispatch-2.41-1.el7.1.noarch perl-Parallel-ForkManager-1.18-2.el7.noarch perl-DBD-MySQL-4.023-6.el7.x86_64 perl-DBI-1.627-4.el7.x86_64

2、上传、解压、安装MHA Manager软件包

tar xf mha4mysql-manager-0.56.tar.gz
cd mha4mysql-manager-0.56/
perl Makefile.PL
make && make install

六、配置ssh双向密钥对

服务器先生成一个密钥对,把自己的公钥传给对方

1、server05(192.168.200.115)

ssh-keygen -t rsa
ssh-copy-id server01
ssh-copy-id server02
ssh-copy-id server03
ssh-copy-id server04

-----------------------------------------
或
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.111
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.112
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.113
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.114


注意:Server05需要连接每个主机测试,因为第一次连接的时候需要输入yes,影响后期故障切换时,对于每个主机的SSH控制。
ssh server01
ssh server02
ssh server03
ssh server04

-------------------------------------

ssh root@192.168.200.111
ssh root@192.168.200.112
ssh root@192.168.200.113
ssh root@192.168.200.114

2、server04(192.168.200.114)

ssh-keygen -t rsa
ssh-copy-id server01
ssh-copy-id server02
ssh-copy-id server03

------------------------------------------------------
或
ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.111
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.112
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.113

3、server03(192.168.200.113)

ssh-keygen -t rsa
ssh-copy-id server01
ssh-copy-id server02
ssh-copy-id server04

------------------------------------------------------------
或
ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.111
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.112
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.114

4、server02(192.168.200.112)

ssh-keygen -t rsa
ssh-copy-id server01
ssh-copy-id server03
ssh-copy-id server04

-----------------------------------------------------------------------
ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.111
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.113
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.114

5、server01(192.168.200.111)

ssh-keygen -t rsa
ssh-copy-id server02
ssh-copy-id server03
ssh-copy-id server04

-----------------------------------------------------------------
或
ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.112
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.113
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.114

七、安装mysql

1、server01-server04主机上

yum -y install mariadb mariadb-server mariadb-devel
//启动服务 systemctl start mariadb
//查看端口 netstat
-lnpt | grep :3306

2、给数据库设置初始密码(后面会用到)

mysqladmin -u root password 123456

八、搭建主从复制环境

1、修改mysql配置文件

server01

vim /etc/my.cnf
[mysqld] server
-id = 1 log-bin=master-bin log-slave-updates=true relay_log_purge=0

systemctl restart mariadb

server02

vim /etc/my.cnf
[mysqld]
server-id=2
log-bin=master-bin
log-slave-updates=true
relay_log_purge=0

systemctl restart mariadb

server03

vim /etc/my.cnf
[mysqld]
server-id=3
log-bin=mysql-bin
relay-log=slave-relay-bin
log-slave-updates=true
relay_log_purge=0

systemctl restart mariadb

server04

vim /etc/my.cnf
[mysqld]
server-id=4
log-bin=mysql-bin
relay-log=slave-relay-bin
log-slave-updates=true
relay_log_purge=0

systemctl restart mariadb

2、所有mysql服务器创建复制授权用户

grant replication slave on *.* to 'repl'@'192.168.200.%' identified by '123456';

flush privileges;

3、查看主库备份时的binlog名称和文件(192.168.200.111  server01)

show master status;

+-------------------+----------+--------------+------------------+
| File              | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| master-bin.000001 |      474 |              |                  |
+-------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

4、从数据库启动slave(192.168.200.112-114   server02-server04)

stop slave;

CHANGE MASTER TO
MASTER_HOST='192.168.200.111',
MASTER_USER='repl',
MASTER_PASSWORD='123456',
MASTER_LOG_FILE='master-bin.000001',
MASTER_LOG_POS=474;

start slave;

show slave statusG
            Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

5、三台 slave 服务器设置 read_only 状态(server02-server04)

从库对外只提供读服务,只所以没有写进 mysql 配置文件,是因为随时 server02 会提升为 master。

[root@server02 ~]# mysql -uroot -p123456 -e 'set global read_only=1'
[root@server03 ~]# mysql -uroot -p123456 -e 'set global read_only=1'
[root@server04 ~]# mysql -uroot -p123456 -e 'set global read_only=1'

6、创建监控用户(server01-server04)

grant all privileges on *.* to 'root'@'192.168.200.%' identified by '123456';
flush privileges;

7、为自己的主机名授权

server01

grant all privileges on *.* to 'root'@'server01' identified by '123456';
flush privileges;

server02

grant all privileges on *.* to 'root'@'server02' identified by '123456';
flush privileges;

server03

grant all privileges on *.* to 'root'@'server03' identified by '123456';
flush privileges;

server04

grant all privileges on *.* to 'root'@'server04' identified by '123456';
flush privileges;

九、配置MHA环境

server05(192.168.200.115)在软件包解压后的目录里面有样配置文件

1、创建MHA的工作目录及相关配置文件

mkdir /etc/masterha
cp mha4mysql-manager-0.56/samples/conf/app1.cnf /etc/masterha

2、修改app1.cnf配置文件

vim /etc/masterha/app1.cnf

[server default]
manager_workdir=/var/log/masterha/app1   #设置manager工作日志
manager_log=/var/log/masterha/app1/manager.log   #设置manager的日志,这两条都是默认存在的

  
master_binlog_dir=/var/lib/mysql   #设置master默认保存binlog的位置,以便MHA可以找到master日志

master_ip_failover_script= /usr/local/bin/master_ip_failover  #设置自动failover时候的切换脚本

password=123456   #设置mysql中root用户的密码
user=root

ping_interval=1    #ping包的时间间隔

remote_workdir=/tmp   #设置远端mysql在发生切换时保存binlog的具体位置

repl_password=123456   #设置复制用户的密码和用户名
repl_user=repl



[server1]
hostname=server01
port=3306

[server2]
hostname=server02
candidate_master=1   #server02有可能会被提升为新的主
port=3306
check_repl_delay=0

[server3]
hostname=server03
port=3306

[server4]
hostname=server04
port=3306

3、配置故障转移脚本

vim /usr/local/bin/master_ip_failover

#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port,
);
my $vip = '192.168.200.100';              # 写入VIP
my $key = "1";   #非keepalived方式切换脚本使用的
my $ssh_start_vip = "/sbin/ifconfig eno16777728:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eno16777728:$key down";  #那么这里写服务的开关命令
$ssh_user = "root";
GetOptions(
'command=s' => $command,
'ssh_user=s' => $ssh_user,
'orig_master_host=s' => $orig_master_host,
'orig_master_ip=s' => $orig_master_ip,
'orig_master_port=i' => $orig_master_port,
'new_master_host=s' => $new_master_host,
'new_master_ip=s' => $new_master_ip,
'new_master_port=i' => $new_master_port,
);
exit &main();
sub main {
print "

IN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===

";
if ( $command eq "stop" || $command eq "stopssh" ) {
# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
# If you manage master ip address at global catalog database,
# invalidate orig_master_ip here.
my $exit_code = 1;
#eval {
# print "Disabling the VIP on old master: $orig_master_host 
";
# &stop_vip();
# $exit_code = 0;
#};
eval {
print "Disabling the VIP on old master: $orig_master_host 
";
#my $ping=`ping -c 1 10.0.0.13 | grep "packet loss" | awk -F',' '{print $3}' | awk '{print $1}'`;
#if ( $ping le "90.0%"&& $ping gt "0.0%" ){
#$exit_code = 0;
#}
#else {
&stop_vip();
# updating global catalog, etc
$exit_code = 0;
#}
};

if ($@) {
warn "Got Error: $@
";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
# all arguments are passed.
# If you manage master ip address at global catalog database,
# activate new_master_ip here.
# You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host 
";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK 
";
`ssh $ssh_user@$orig_master_ip " $ssh_start_vip "`;
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $ssh_user@$new_master_host " $ssh_start_vip "`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user@$orig_master_host " $ssh_stop_vip "`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --
new_master_host=host --new_master_ip=ip --new_master_port=port
"; }

#给脚本加执行权限
chmod +x /usr/local/bin/master_ip_failover

4、设置从库relay log的清除方式(server02-server04)

mysql -uroot -p123456 -e 'set global relay_log_purge=0;'

5、检查MHA ssh通信状态

masterha_check_ssh --conf=/etc/masterha/app1.cnf

Wed Oct 23 07:05:41 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Oct 23 07:05:41 2019 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Wed Oct 23 07:05:41 2019 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Wed Oct 23 07:05:41 2019 - [info] Starting SSH connection tests..
Wed Oct 23 07:05:44 2019 - [debug] 
Wed Oct 23 07:05:41 2019 - [debug]  Connecting via SSH from root@server01(192.168.200.111:22) to root@server02(192.168.200.112:22)..
Wed Oct 23 07:05:42 2019 - [debug]   ok.
Wed Oct 23 07:05:42 2019 - [debug]  Connecting via SSH from root@server01(192.168.200.111:22) to root@server03(192.168.200.113:22)..
Wed Oct 23 07:05:43 2019 - [debug]   ok.
Wed Oct 23 07:05:43 2019 - [debug]  Connecting via SSH from root@server01(192.168.200.111:22) to root@server04(192.168.200.114:22)..
Wed Oct 23 07:05:44 2019 - [debug]   ok.
Wed Oct 23 07:05:45 2019 - [debug] 
Wed Oct 23 07:05:42 2019 - [debug]  Connecting via SSH from root@server03(192.168.200.113:22) to root@server01(192.168.200.111:22)..
Wed Oct 23 07:05:43 2019 - [debug]   ok.
Wed Oct 23 07:05:43 2019 - [debug]  Connecting via SSH from root@server03(192.168.200.113:22) to root@server02(192.168.200.112:22)..
Wed Oct 23 07:05:44 2019 - [debug]   ok.
Wed Oct 23 07:05:44 2019 - [debug]  Connecting via SSH from root@server03(192.168.200.113:22) to root@server04(192.168.200.114:22)..
Wed Oct 23 07:05:45 2019 - [debug]   ok.
Wed Oct 23 07:05:45 2019 - [debug] 
Wed Oct 23 07:05:42 2019 - [debug]  Connecting via SSH from root@server02(192.168.200.112:22) to root@server01(192.168.200.111:22)..
Wed Oct 23 07:05:43 2019 - [debug]   ok.
Wed Oct 23 07:05:43 2019 - [debug]  Connecting via SSH from root@server02(192.168.200.112:22) to root@server03(192.168.200.113:22)..
Wed Oct 23 07:05:44 2019 - [debug]   ok.
Wed Oct 23 07:05:44 2019 - [debug]  Connecting via SSH from root@server02(192.168.200.112:22) to root@server04(192.168.200.114:22)..
Wed Oct 23 07:05:45 2019 - [debug]   ok.
Wed Oct 23 07:05:46 2019 - [debug] 
Wed Oct 23 07:05:43 2019 - [debug]  Connecting via SSH from root@server04(192.168.200.114:22) to root@server01(192.168.200.111:22)..
Wed Oct 23 07:05:44 2019 - [debug]   ok.
Wed Oct 23 07:05:44 2019 - [debug]  Connecting via SSH from root@server04(192.168.200.114:22) to root@server02(192.168.200.112:22)..
Wed Oct 23 07:05:44 2019 - [debug]   ok.
Wed Oct 23 07:05:44 2019 - [debug]  Connecting via SSH from root@server04(192.168.200.114:22) to root@server03(192.168.200.113:22)..
Wed Oct 23 07:05:45 2019 - [debug]   ok.
Wed Oct 23 07:05:46 2019 - [info] All SSH connection tests passed successfully.
出现这个结果说明执行成功

6、检查整个集群的状态

[root@server05 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf

Wed Oct 23 07:10:22 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Oct 23 07:10:22 2019 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Wed Oct 23 07:10:22 2019 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Wed Oct 23 07:10:22 2019 - [info] MHA::MasterMonitor version 0.56.
Wed Oct 23 07:10:23 2019 - [info] Dead Servers:
Wed Oct 23 07:10:23 2019 - [info] Alive Servers:
Wed Oct 23 07:10:23 2019 - [info]   server01(192.168.200.111:3306)
Wed Oct 23 07:10:23 2019 - [info]   server02(192.168.200.112:3306)
Wed Oct 23 07:10:23 2019 - [info]   server03(192.168.200.113:3306)
Wed Oct 23 07:10:23 2019 - [info]   server04(192.168.200.114:3306)
Wed Oct 23 07:10:23 2019 - [info] Alive Slaves:
Wed Oct 23 07:10:23 2019 - [info]   server02(192.168.200.112:3306)  Version=5.5.64-MariaDB (oldest major version between slaves) log-bin:enabled
Wed Oct 23 07:10:23 2019 - [info]     Replicating from 192.168.200.111(192.168.200.111:3306)
Wed Oct 23 07:10:23 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Oct 23 07:10:23 2019 - [info]   server03(192.168.200.113:3306)  Version=5.5.64-MariaDB (oldest major version between slaves) log-bin:enabled
Wed Oct 23 07:10:23 2019 - [info]     Replicating from 192.168.200.111(192.168.200.111:3306)
Wed Oct 23 07:10:23 2019 - [info]   server04(192.168.200.114:3306)  Version=5.5.64-MariaDB (oldest major version between slaves) log-bin:enabled
Wed Oct 23 07:10:23 2019 - [info]     Replicating from 192.168.200.111(192.168.200.111:3306)
Wed Oct 23 07:10:23 2019 - [info] Current Alive Master: server01(192.168.200.111:3306)
Wed Oct 23 07:10:23 2019 - [info] Checking slave configurations..
Wed Oct 23 07:10:23 2019 - [info] Checking replication filtering settings..
Wed Oct 23 07:10:23 2019 - [info]  binlog_do_db= , binlog_ignore_db= 
Wed Oct 23 07:10:23 2019 - [info]  Replication filtering check ok.
Wed Oct 23 07:10:23 2019 - [info] Starting SSH connection tests..
Wed Oct 23 07:10:28 2019 - [info] All SSH connection tests passed successfully.
Wed Oct 23 07:10:28 2019 - [info] Checking MHA Node version..
Wed Oct 23 07:10:29 2019 - [info]  Version check ok.
Wed Oct 23 07:10:29 2019 - [info] Checking SSH publickey authentication settings on the current master..
Wed Oct 23 07:10:29 2019 - [info] HealthCheck: SSH to server01 is reachable.
Wed Oct 23 07:10:30 2019 - [info] Master MHA Node version is 0.56.
Wed Oct 23 07:10:30 2019 - [info] Checking recovery script configurations on the current master..
Wed Oct 23 07:10:30 2019 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000001 
Wed Oct 23 07:10:30 2019 - [info]   Connecting to root@server01(server01).. 
  Creating /tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to master-bin.000001
Wed Oct 23 07:10:30 2019 - [info] Master setting check done.
Wed Oct 23 07:10:30 2019 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Oct 23 07:10:30 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server02 --slave_ip=192.168.200.112 --slave_port=3306 --workdir=/tmp --target_version=5.5.64-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Wed Oct 23 07:10:30 2019 - [info]   Connecting to root@192.168.200.112(server02:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mariadb-relay-bin.000002
    Temporary relay log file is /var/lib/mysql/mariadb-relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Oct 23 07:10:31 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server03 --slave_ip=192.168.200.113 --slave_port=3306 --workdir=/tmp --target_version=5.5.64-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Wed Oct 23 07:10:31 2019 - [info]   Connecting to root@192.168.200.113(server03:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to slave-relay-bin.000002
    Temporary relay log file is /var/lib/mysql/slave-relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Oct 23 07:10:31 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server04 --slave_ip=192.168.200.114 --slave_port=3306 --workdir=/tmp --target_version=5.5.64-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Wed Oct 23 07:10:31 2019 - [info]   Connecting to root@192.168.200.114(server04:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to slave-relay-bin.000002
    Temporary relay log file is /var/lib/mysql/slave-relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Oct 23 07:10:31 2019 - [info] Slaves settings check done.
Wed Oct 23 07:10:31 2019 - [info] 
server01 (current master)
 +--server02
 +--server03
 +--server04

Wed Oct 23 07:10:31 2019 - [info] Checking replication health on server02..
Wed Oct 23 07:10:31 2019 - [info]  ok.
Wed Oct 23 07:10:31 2019 - [info] Checking replication health on server03..
Wed Oct 23 07:10:31 2019 - [info]  ok.
Wed Oct 23 07:10:31 2019 - [info] Checking replication health on server04..
Wed Oct 23 07:10:31 2019 - [info]  ok.
Wed Oct 23 07:10:31 2019 - [info] Checking master_ip_failover_script status:
Wed Oct 23 07:10:31 2019 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=server01 --orig_master_ip=192.168.200.111 --orig_master_port=3306 


IN SCRIPT TEST====/sbin/ifconfig eno16777728:1 down==/sbin/ifconfig eno16777728:1 192.168.200.100===

Checking the Status of the script.. OK 
bash: /sbin/ifconfig: 没有那个文件或目录
Wed Oct 23 07:10:32 2019 - [info]  OK.
Wed Oct 23 07:10:32 2019 - [warning] shutdown_script is not defined.
Wed Oct 23 07:10:32 2019 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

十、VIP配置管理

1、检查/etc/masterha/app1.cnf文件,检查如下行是否正确

[root@server05 ~]# grep -n 'master_ip_failover_script' /etc/masterha/app1.cnf
7:master_ip_failover_script= /usr/local/bin/master_ip_failover

2、查看故障转移脚本,确定网卡配置正确

[root@server05 ~]# head -13 /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port,
);
my $vip = '192.168.200.100';              
my $key = "1";     
my $ssh_start_vip = "/sbin/ifconfig eno16777728:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eno16777728:$key down"; 
$ssh_user = "root";

3、检查manager状态

[root@server05 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 is stopped(2:NOT_RUNNING).     //未开启

4、开启manager监控

nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dea d_master_conf --ignore_last_failover< /dev/null >/var/log/masterha/app1/manager.log 2>&1 &

再次检查
[root@server05 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:6588) is running(0:PING_OK), master:server01
可以看见已经在监控了

5、查看VIP server01(192.168.200.111)

 在查看VIP时,第一台主机上没有查看到VIP,可能是因为ifconfig命令没有,安装工具软件包net-tools解决问题

[root@server01 ~]# yum -y install net-tools

[root@server01 ~]# ip a | grep eno16777728 2: eno16777728: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 inet 192.168.200.111/24 brd 192.168.200.255 scope global eno16777728 inet 192.168.200.100/24 brd 192.168.200.255 scope global secondary eno16777728:1

十一、primary master(192.168.200.111)模拟主库故障

server01(192.168.200.111)

[root@server01 ~]# systemctl stop mariadb
[root@server01 ~]# netstat -lnpt | grep :3306
[root@server01 ~]# ip a | grep eno16777728
2: eno16777728: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
inet 192.168.200.111/24 brd 192.168.200.255 scope global eno16777728

server03(192.168.200.113)状态

MariaDB [(none)]> show slave statusG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.200.112
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000001
          Read_Master_Log_Pos: 1372
               Relay_Log_File: slave-relay-bin.000002
                Relay_Log_Pos: 530
        Relay_Master_Log_File: master-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

server04(192.168.200.114)状态

MariaDB [(none)]> show slave statusG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.200.112
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000001
          Read_Master_Log_Pos: 1372
               Relay_Log_File: slave-relay-bin.000002
                Relay_Log_Pos: 530
        Relay_Master_Log_File: master-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

Server05(192.168.200.115) 查看监控配置文件已经发生了变化(server01的配置已被删除),监控已自动关闭。

[root@server05 ~]# cat /etc/masterha/app1.cnf 
[server default]
manager_log=/var/log/masterha/app1/manager.log
manager_workdir=/var/log/masterha/app1
master_binlog_dir=/var/lib/mysql
master_ip_failover_script=/usr/local/bin/master_ip_failover
password=123456
ping_interval=1
remote_workdir=/tmp
repl_password=123456
repl_user=repl
user=root

[server2]
candidate_master=1
check_repl_delay=0
hostname=server02
port=3306

[server3]
hostname=server03
port=3306

[server4]
hostname=server04
port=3306

#监控已关闭 [1]+ 完成 nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dea d_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1

Server05(192.168.200.115) 故障切换过程中的日志文件内容如下:

Selected server02 as a new master.
server02: OK: Applying all logs succeeded.
server02: OK: Activated master IP address.
server04: This host has the latest relay log events.
server03: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
server04: OK: Applying all logs succeeded. Slave started, replicating from server02.
server03: OK: Applying all logs succeeded. Slave started, replicating from server02.
server02: Resetting slave info succeeded.
Master failover to server02(192.168.200.112:3306) completed successfully.

十二、故障主库修复及VIP切回测试

1、修复server01,并提升server02为新的主库获取VIP

server01(192.168.200.111)

[root@server01 ~]# systemctl start mariadb
[root@server01 ~]# netstat -lnpt | grep :3306
tcp        0      0 0.0.0.0:3306            0.0.0.0:*               LISTEN      6131/mysqld
[root@server01 ~]# mysql -u root -p123456
stop slave;
CHANGE MASTER TO MASTER_HOST
='192.168.200.112', MASTER_USER='repl', MASTER_PASSWORD='123456';
start slave;
show slave statusG
*************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.200.112 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000001 Read_Master_Log_Pos: 1372 Relay_Log_File: mariadb-relay-bin.000002 Relay_Log_Pos: 1208 Relay_Master_Log_File: master-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes

server05(192.168.200.115)修改监控配置文件添加server01

[root@server05 ~]# vim /etc/masterha/app1.cnf
[server01]
hostname=server01
port=3306

server05(192.168.200.115)检查集群状态

[root@server05 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
-----------------------------------忽略部分信息-----------------------------------
Thu Aug 31 22:20:30 2017 - [info] Alive Servers:
Thu Aug 31 22:20:30 2017 - [info]   server01(192.168.200.111:3306)
Thu Aug 31 22:20:30 2017 - [info]   server02(192.168.200.112:3306)
Thu Aug 31 22:20:30 2017 - [info]   server03(192.168.200.113:3306)
Thu Aug 31 22:20:30 2017 - [info]   server04(192.168.200.114:3306)
-----------------------------------忽略部分信息-----------------------------------
server02 (current master)
 +--server01
 +--server03
 +--server04
-----------------------------------忽略部分信息-----------------------------------
MySQL Replication Health is OK.

server05(192.168.200.115)开启监控

[root@server05 ~]# nohup masterha_manager --conf=/etc/masterha/ap.cnf --remove_dead_master_conf --ignore_last_failover< /dev/null >/var/log/masterha/app1/manager.log 2>&1 &
[1] 17031

server02(192.168.200.112)获得VIP

[root@server02 ~]# ip a 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno16777728: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:ec:88:30 brd ff:ff:ff:ff:ff:ff
    inet 192.168.200.112/24 brd 192.168.200.255 scope global eno16777728
       valid_lft forever preferred_lft forever
    inet 192.168.200.100/24 brd 192.168.200.255 scope global secondary eno16777728:1
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:feec:8830/64 scope link 
       valid_lft forever preferred_lft forever

2、模拟server02故障,VIP切回server01

server02(192.168.200.112)

[root@server02 ~]# systemctl stop mariadb
[root@server02 ~]# netstat -lnpt | grep :3306

server01(192.168.200.111)

[root@server01 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno16777728: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:9b:5c:f0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.200.111/24 brd 192.168.200.255 scope global eno16777728
       valid_lft forever preferred_lft forever
    inet 192.168.200.100/24 brd 192.168.200.255 scope global secondary eno16777728:1
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe9b:5cf0/64 scope link 
       valid_lft forever preferred_lft forever

server03(192.168.200.113)状态

MariaDB [(none)]> show slave statusG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.200.111
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000002
          Read_Master_Log_Pos: 923
               Relay_Log_File: slave-relay-bin.000002
                Relay_Log_Pos: 530
        Relay_Master_Log_File: master-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

server04(192.168.200.114)状态:

MariaDB [(none)]> show slave statusG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.200.111
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000002
          Read_Master_Log_Pos: 923
               Relay_Log_File: slave-relay-bin.000002
                Relay_Log_Pos: 530
        Relay_Master_Log_File: master-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

Server05(192.168.200.115) 配置文件变化(已经移除故障机server2配置):

[root@server05 ~]# cat /etc/masterha/app1.cnf 
[server default]
manager_log=/var/log/masterha/app1/manager.log
manager_workdir=/var/log/masterha/app1
master_binlog_dir=/var/lib/mysql
master_ip_failover_script=/usr/local/bin/master_ip_failover
password=123456
ping_interval=1
remote_workdir=/tmp
repl_password=123456
repl_user=repl
user=root

[server1]
hostname=server01
port=3306

[server3]
hostname=server03
port=3306

[server4]
hostname=server04
port=3306

Server05(192.168.200.115) 监控日志:

-----------------------------------忽略部分信息-----------------------------------

Selected server01 as a new master.
server01: OK: Applying all logs succeeded.
server01: OK: Activated master IP address.
server03: This host has the latest relay log events.
server04: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
server04: OK: Applying all logs succeeded. Slave started, replicating from server01.
server03: OK: Applying all logs succeeded. Slave started, replicating from server01.
server01: Resetting slave info succeeded.
Master failover to server01(192.168.200.111:3306) completed successfully.

修复server02(192.168.200.112)主机,指向新的主库:

[root@server02 ~]# systemctl start mariadb
[root@server02 ~]# netstat -lnpt | grep :3306
tcp        0      0 0.0.0.0:3306            0.0.0.0:*               LISTEN      5982/mysqld 
[root@server02 ~]# mysql -u root -p123456

stop slave;

CHANGE MASTER TO
MASTER_HOST='192.168.200.111',
MASTER_USER='repl',
MASTER_PASSWORD='123456';

start slave;

show slave statusG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.200.111
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000002
          Read_Master_Log_Pos: 923
               Relay_Log_File: mariadb-relay-bin.000003
                Relay_Log_Pos: 530
        Relay_Master_Log_File: master-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

Server05(192.168.200.115)修改监控配置文件添加server2配置:

[root@server05 ~]# vim /etc/masterha/app1.cnf
[server2]
hostname=server02
candidate_master=1
port=3306
check_repl_delay=0

Server05(192.168.200.115)检查集群状态:

[root@server05 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
server01 (current master)
 +--server02
 +--server03
 +--server04

-----------------------------------忽略部分信息-----------------------------------

MySQL Replication Health is OK.

server02修复好了,并且server01成为了新的zhu

ssh server01

ssh server02

ssh server03

ssh server04

check_repl_delay=0

原文地址:https://www.cnblogs.com/tanxiaojuncom/p/11722572.html