Mysql 5.6 MHA (gtid) on Kylin

mha on Kylin
ip hostname repl role mha role
* mysql1 master node
* mysql2 slave node
* mysql3 slave node/manager
mysql version MySQL Community Server 5.6.15

1 系统配置

[root@localhost ~]# cat /etc/issue
Kylin 3.3
Kernel on an m
KRelease (Trial)
[root@localhost ~]# cat /etc/redhat-release
Kylin Linux release 3.3.1707 (Core)
[root@localhost ~]# sysctl -a |grep swap
vm.swappiness = 30
[root@localhost ~]# vim /etc/sysctl.conf
vm.swappiness = 10
[root@localhost ~]# sysctl -p
vm.swappiness = 10
[root@localhost ~]# sysctl -a |grep swap
vm.swappiness = 10
[root@localhost ~]# cat /sys/block/sda/queue/rotational
1
[root@localhost ~]# cat /sys/block/sda/queue/scheduler
noop [deadline] cfq
[root@localhost ~]# vim /etc/profile
HISTTIMEFORMAT="%Y:%M:%D %H-%m-%s"
export=HISTTIMEFORMAT
[root@localhost ~]# date
Thu Jul 12 03:09:27 EDT 2018
[root@localhost ~]# hostname
localhost.localdomain
[root@localhost ~]# vim /etc/hosts
127.0.0.1 mysql1
[root@localhost ~]# vim /etc/sysconfig/network
HOSTNAME=mysql1
[root@localhost ~]# getenforce
Enforcing
[root@localhost ~]# vim /etc/selinux/config
SELINUX=disabled
[root@localhost ~]# systemctl stop firewalld.service
[root@localhost ~]# systemctl disable firewalld.service
[root@localhost ~]# reboot
[root@localhost ~]# hostnamectl set-hostname mysql1
[root@localhost ~]# hostname
mysql1
[root@mysql1 ~]# systemctl status firewalld.service
--# vim /etc/hosts
* mysql1
* mysql2
* mysql3

2 安装mysql

[root@mysql1 ~]# groupadd mysql
[root@mysql1 ~]# useradd -g mysql mysql
[root@mysql1 ~]# passwd mysql
[root@mysql1 ~]# ulimit -n
1024
[root@mysql1 ~]# vim /etc/security/limits.conf
mysql soft nproc 65535
mysql hard nproc 65535
mysql soft nofile 65535
mysql hard nofile 65535
[root@mysql1 ~]# ulimit -n 65535
[root@mysql1 ~]# vim /home/mysql/.bash_profile
export LANG=en_US.UTF-8
export PATH=/usr/local/mysql/bin:$PATH
export MYSQL_PS1="(u@h:p) [d]> "
yum安装相应的依赖包
[root@mysql1 ~]# yum -y install lrzsz
[root@mysql1 ~]# yum install -y gcc gcc-* make cmake gcc-c++ libaio libaio-devel bison bison-devel autoconf automake zlib* libxml* ncurses-devel libmcrypt* libtool-ltdl-devel* --skip-broken
[root@mysql1 ~]# yum install -y openssl openssl-devel ncurses ncurses-devel
[root@mysql1 ~]# mkdir -p /data/mysqldata/{3306/{data,tmp,binlog,slave,log/iblog},backup,scripts}
[root@mysql1 soft]# yum list|grep mysql
Repository gcc-4.9 is listed more than once in the configuration
Repository other is listed more than once in the configuration
mysql-community-client.x86_64 5.6.15-4.ky3 other
mysql-community-common.x86_64 5.6.15-4.ky3 other
mysql-community-devel.x86_64 5.6.15-4.ky3 other
mysql-community-embedded.x86_64 5.6.15-4.ky3 other
mysql-community-embedded-devel.x86_64
mysql-community-libs.x86_64 5.6.15-4.ky3 other
mysql-community-server.x86_64 5.6.15-4.ky3 other
mysql-community-test.x86_64 5.6.15-4.ky3 other
mysql-connector-odbc.x86_64 5.2.5-6.ky3.kb2 base

[root@mysql1 soft]# tar -zxvf mysql-5.6.15-linux-glibc2.5-x86_64.tar.gz -C /usr/local/
[root@mysql1 soft]# cd /usr/local/
[root@mysql1 local]# ln -s mysql-5.6.15-linux-glibc2.5-x86_64 mysql
[root@mysql1 local]# chown -R mysql:mysql /usr/local/mysql/
[root@mysql1 local]# chown -R mysql:mysql /data/mysqldata/
--my.cnf
[mysql@mysql1 3306]$ /usr/local/mysql/scripts/mysql_install_db --defaults-file=/data/mysqldata/3306/my.cnf --datadir=/data/mysqldata/3306/data --basedir=/usr/local/mysql --user=mysql
[mysql@mysql1 3306]$ /usr/local/mysql/bin/mysqld_safe --defaults-file=/data/mysqldata/3306/my.cnf &
[mysql@mysql1 3306]$ ps -ef|grep mysql
root 2673 2652 0 04:08 pts/1 00:00:00 su - mysql
mysql 2674 2673 0 04:08 pts/1 00:00:00 -bash
mysql 2880 2674 0 04:15 pts/1 00:00:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --defaults-file=/data/mysqldata/3306/my.cnf
mysql 3865 2880 11 04:15 pts/1 00:00:00 /usr/local/mysql/bin/mysqld --defaults-file=/data/mysqldata/3306/my.cnf --basedir=/usr/local/mysql --datadir=/data/mysqldata/3306/data --plugin-dir=/usr/local/mysql/lib/plugin --log-error=/data/mysqldata/3306/log/mysql-error.log --open-files-limit=65535 --pid-file=/data/mysqldata/3306/data/mysql1.pid --socket=/data/mysqldata/3306/mysql.sock --port=3306
mysql 3900 2674 0 04:15 pts/1 00:00:00 ps -ef
mysql 3901 2674 0 04:15 pts/1 00:00:00 grep --color=auto mysql
[mysql@mysql1 3306]$ mysql -S /data/mysqldata/3306/mysql.sock
(root@localhost:mysql.sock) [(none)]> delete from mysql.user where (user,host) not in(select 'root','localhost');
Query OK, 5 rows affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
(root@localhost:mysql.sock) [(none)]> flush privileges;
Query OK, 0 rows affected (0.00 sec)
(root@localhost:mysql.sock) [(none)]> truncate table mysql.db;
Query OK, 0 rows affected (0.00 sec)
(root@localhost:mysql.sock) [(none)]> grant ALL PRIVILEGES ON *.* TO '*'@'%' IDENTIFIED BY "***";
Query OK, 0 rows affected (0.00 sec)
(root@localhost:mysql.sock) [(none)]> grant ALL PRIVILEGES ON *.* TO '*'@'127.0.0.1' IDENTIFIED BY "***";
Query OK, 0 rows affected (0.00 sec)
(root@localhost:mysql.sock) [(none)]> flush privileges;
Query OK, 0 rows affected (0.00 sec)

3 主从搭建

* master
* slave1
* slave2

69
(root@localhost:mysql.sock) [(none)]> GRANT REPLICATION SLAVE,RELOAD,SUPER ON *.* to *@'1*%' identified by "***";

--mysqldump
/usr/local/mysql/bin/mysqldump -h 127.0.0.1 -u
scp 
scp

(system@127.0.0.1:3306) [(none)]> show global variables like '%gtid%';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| enforce_gtid_consistency | ON |
| gtid_executed | |
| gtid_mode | ON |
| gtid_owned | |
| gtid_purged | |
+--------------------------+-------+
5 rows in set (0.00 sec)

(system@127.0.0.1:3306) [(none)]> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000001 | 151 | | | |
+------------------+----------+--------------+------------------+-------------------+


73 (system@127.0.0.1:3306) [(none)]> show master status;
+------------------+-----------+--------------+------------------+--------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+-----------+--------------+------------------+--------------------------------------------+
| mysql-bin.000001 | 191447949 | | | e68afff3-85b5-11e8-856e-005056bff00b:1-362 |
+------------------+-----------+--------------+------------------+--------------------------------------------+
1 row in set (0.00 sec)

(system@127.0.0.1:3306) [(none)]> show global variables like '%gtid_executed%';
+---------------+--------------------------------------------+
| Variable_name | Value |
+---------------+--------------------------------------------+
| gtid_executed | e68afff3-85b5-11e8-856e-005056bff00b:1-362 |
+---------------+--------------------------------------------+
[system@3306][(none)]> show global variables like '%gtid%';
+--------------------------+--------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------+
| enforce_gtid_consistency | ON |
| gtid_executed | d4e95eb8-85ae-11e8-8540-000c29450242:1-362 |
| gtid_mode | ON |
| gtid_owned | |
| gtid_purged | |
+--------------------------+--------------------------------------------+
show global variables like '%gtid_executed%';
reset master;

CHANGE MASTER TO
MASTER_HOST='*',
MASTER_USER='*',
MASTER_PASSWORD='***',
MASTER_PORT=3306,
MASTER_AUTO_POSITION = 1;

74 (system@127.0.0.1:3306) [(none)]> show master status;
+------------------+-----------+--------------+------------------+--------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+-----------+--------------+------------------+--------------------------------------------+
| mysql-bin.000001 | 191447949 | | | e68afff3-85b5-11e8-856e-005056bff00b:1-362 |
+------------------+-----------+--------------+------------------+--------------------------------------------+
1 row in set (0.00 sec)
(system@127.0.0.1:3306) [(none)]> show global variables like 'server_uuid';
+---------------+--------------------------------------+
| Variable_name | Value |
+---------------+--------------------------------------+
| server_uuid | e68afff3-85b5-11e8-856e-005056bff00b |

(system@127.0.0.1:3306) [(none)]> show global variables like '%gtid_executed%';
+---------------+--------------------------------------------+
| Variable_name | Value |
+---------------+--------------------------------------------+
| gtid_executed | e68afff3-85b5-11e8-856e-005056bff00b:1-362 |
+---------------+--------------------------------------------+
(system@127.0.0.1:3306) [mysql]> show global variables like '%gtid%';
+--------------------------+--------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------+
| enforce_gtid_consistency | ON |
| gtid_executed | e68afff3-85b5-11e8-856e-005056bff00b:1-362 |
| gtid_mode | ON |
| gtid_owned | |
[system@3306][(none)]> show slave statusG;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: *
Master_User: *
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 558
Relay_Log_File: relaylog.000002
Relay_Log_Pos: 768
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 558
Relay_Log_Space: 965
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 201807121
Master_UUID: cab5ec92-85ab-11e8-852c-005056bf5092
Master_Info_File: /data/mysqldata/3306/data/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: cab5ec92-85ab-11e8-852c-005056bf5092:1-2
Executed_Gtid_Set: cab5ec92-85ab-11e8-852c-005056bf5092:1-2
Auto_Position: 1

69 master
[mysql@mysql1 scripts]$ ./mysqlplus.sh
use test;
create table t1(id int,name varchar(10));
insert into t1 values(1,'AAAAA');
commit;
select * from test.t1;

use test;
create table t2(id int,name varchar(10));
insert into t2 values(1,'AAAAA');
commit;
select * from test.t2;

69 (system@127.0.0.1:3306) [test]> show master status;
+------------------+----------+--------------+------------------+------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+------------------------------------------+
| mysql-bin.000001 | 558 | | | cab5ec92-85ab-11e8-852c-005056bf5092:1-2 |
+------------------+----------+--------------+------------------+------------------------------------------+
(system@127.0.0.1:3306) [test]> show global variables like '%gtid%';
+--------------------------+------------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------------+
| enforce_gtid_consistency | ON |
| gtid_executed | cab5ec92-85ab-11e8-852c-005056bf5092:1-2 |
| gtid_mode | ON |
| gtid_owned | |
| gtid_purged | |
+--------------------------+------------------------------------------+
73,74
(system@127.0.0.1:3306) [test]> show global variables like '%gtid%';
+--------------------------+------------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------------+
| enforce_gtid_consistency | ON |
| gtid_executed | cab5ec92-85ab-11e8-852c-005056bf5092:1-2 |
| gtid_mode | ON |
| gtid_owned | |
| gtid_purged | |
+--------------------------+------------------------------------------+

4 mha 搭建

4.1 设置SSH公钥免密码登录
MHA环境需要三台主机需要相互信任:实现三台主机之间相互免密钥登录。
要是root登陆的话,需要修改ssh的配置文件
vim /etc/ssh/sshd_config 中的 PermitRootLogin ,设置为YES,再重启SSH

#root用户操作(/root),pwd


ssh mysql1
ssh mysql2
ssh mysql3
[root@mysql1 ~]# ssh mysql2 #这里不用输入密码,这打通了ssh
Last login: Thu Jul 12 05:55:46 2018 from mysql1
[root@mysql2 ~]# logout
Connection to mysql2 closed.
[root@mysql1 ~]# ssh mysql3
[root@mysql3 ~]# logout
Connection to mysql3 closed.

4.2 安装mha manager和node节点
软件下载路径
[root@mysql1 ~]# cd /data/soft/
[root@mysql1 soft]# ll
total 297404
-rw-r--r-- 1 root root 104700 Jun 8 02:34 mha4mysql-manager-0.56.tar.gz
-rw-r--r-- 1 root root 45947 Jun 8 02:34 mha4mysql-node-0.56.tar.gz
在三个节点(node 和 manager)安装perl-DBD-MySQL,用光盘作yum源,#包括了3个包:Perl-DBD-MySQL,mysql-libs,perl-DBI
[root@mysql1 soft]# yum install -y perl-DBD-MySQL perl-DBI mysql-libs
[root@mysql2 soft]# yum install -y perl-DBD-MySQL perl-DBI mysql-libs
[root@mysql3 soft]# yum install -y perl-DBD-MySQL perl-DBI mysql-libs
# yum install perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker -y

在三个节点安装node(包括管理节点)

tar xzvf mha4mysql-node-0.56.tar.gz
cd mha4mysql-node-0.56
perl Makefile.PL
make && make install

[root@mysql1 mha4mysql-node-0.56]# perl Makefile.PL
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI ...loaded. (1.627)
- DBD::mysql ...loaded. (4.023)
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Looks good
Writing Makefile for mha4mysql::node

在管理节点安装manager
#yum install -y perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager
perl-Config-IniFiles rrdtool perl-rrdtool rrdtool-devel perl-Params-Validate perl-Time-HiRes
perl-MIME-Types perl-Email-Date-Format perl-Mail-Sendmail perl-Mail-Sender perl-TimeDate
perl-MailTools perl-MIME-Lite

# rpm -q perl-CPAN
# yum install -y perl-CPAN
# yum install -y 'perl(CGI)' 'perl(LWP::UserAgent)'
[root@mysql3 soft]# rpm -ivh perl-Log-Dispatch-2.41-1.el7.1.noarch.rpm --nodeps
[root@mysql3 soft]# rpm -ivh perl-Parallel-ForkManager-1.05-1.el7.noarch.rpm

tar xzvf mha4mysql-manager-0.56.tar.gz
cd mha4mysql-manager-0.56
perl Makefile.PL
make && make install

[root@mysql3 mha4mysql-manager-0.56]# perl Makefile.PL
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI ...loaded. (1.627)
- DBD::mysql ...loaded. (4.023)
- Time::HiRes ...loaded. (1.9725)
- Config::Tiny ...loaded. (2.23)
- Log::Dispatch ...missing.
- Parallel::ForkManager ...loaded. (1.05)
- MHA::NodeConst ...loaded. (0.56)
==> Auto-install the 1 mandatory module(s) from CPAN? [y] y
*** Dependencies will be installed the next time you type 'make'.
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Looks good
Writing Makefile for mha4mysql::manager

[root@mysql3 mha4mysql-manager-0.56]# which masterha_check_ssh
/usr/local/bin/masterha_check_ssh
[root@mysql3 ~]# ll /usr/local/bin/masterha_*
-r-xr-xr-x 1 root root 1995 Jul 12 06:22 /usr/local/bin/masterha_check_repl
-r-xr-xr-x 1 root root 1779 Jul 12 06:22 /usr/local/bin/masterha_check_ssh
-r-xr-xr-x 1 root root 1865 Jul 12 06:22 /usr/local/bin/masterha_check_status
-r-xr-xr-x 1 root root 3201 Jul 12 06:22 /usr/local/bin/masterha_conf_host
-r-xr-xr-x 1 root root 2517 Jul 12 06:22 /usr/local/bin/masterha_manager
-r-xr-xr-x 1 root root 2165 Jul 12 06:22 /usr/local/bin/masterha_master_monitor
-r-xr-xr-x 1 root root 2373 Jul 12 06:22 /usr/local/bin/masterha_master_switch
-r-xr-xr-x 1 root root 3879 Jul 12 06:22 /usr/local/bin/masterha_secondary_check
-r-xr-xr-x 1 root root 1739 Jul 12 06:22 /usr/local/bin/masterha_stop
===
rpm -ivh perl-Mail-Sender-0.8.21-2.el7.noarch.rpm
rpm -ivh perl-Mail-Sendmail-0.79-21.el7.noarch.rpm
rpm -ivh perl-MIME-Types-1.38-2.el7.noarch.rpm
rpm -ivh perl-TimeDate-2.30-2.el7.noarch.rp
#Can't locate Class/Load.pm in @INC
rpm -ivh perl-Module-Runtime-0.013-4.el7.noarch.rpm
rpm -ivh perl-Config-Tiny-2.14-7.el7.noarch.rpm
rpm -ivh perl-Try-Tiny-0.12-2.el7.noarch.rpm
rpm -ivh perl-Module-Implementation-0.06-6.el7.noarch.rpm
rpm -ivh perl-Sub-Install-0.926-6.el7.noarch.rpm
rpm -ivh perl-Params-Util-1.07-6.el7.x86_64.rpm
rpm -ivh perl-Data-OptList-0.107-9.el7.noarch.rpm
rpm -ivh perl-Package-Stash-XS-0.26-3.el7.x86_64.rpm
rpm -ivh perl-List-MoreUtils-0.33-9.el7.x86_64.rpm
rpm -ivh perl-Package-DeprecationManager-0.13-7.el7.noarch.rpm
rpm -ivh perl-Package-Stash-0.34-2.el7.noarch.rpm
rpm -ivh perl-Class-Load-0.20-3.el7.noarch.rpm
#Can't locate Params/Validate.pm
rpm -ivh perl-Params-Validate-1.08-4.el7.x86_64.rpm
===
管理节点安装完毕后就应该去编辑主配文件了
[root@mysql3 soft]# mkdir -p /data/mha/etc
[root@mysql3 soft]# mkdir -p /data/mha/etc/app
[root@mysql3 soft]# mkdir -p /data/mha/log
[root@mysql3 soft]# cd /data/mha/etc/
[root@mysql3 etc]# touch app.cnf
[root@mysql3 etc]# chmod +x master_ip_failover

====
master_ip_failover #自动切换时vip管理的脚本,不是必须,如果我们使用keepalived的,我们可以自己编写脚本完成对vip的管理,比如监控mysql,如果mysql异常,我们停止keepalived就行,这样vip就会自动漂移
master_ip_online_change #在线切换时vip的管理,不是必须,同样可以可以自行编写简单的shell完成
power_manager #故障发生后关闭主机的脚本,不是必须
send_report #因故障切换后发送报警的脚本,不是必须,可自行编写简单的shell完成
masterha_secondary_check 一旦MHA到master的监控之间出现问题,MHA Manager将会判断其它两个slave是否能建立到master_ip 3306端口的连接
====
####################常用命令###############
/usr/local/bin/masterha_check_ssh --conf=/data/mha/etc/app.cnf
/usr/local/bin/masterha_check_repl --conf=/data/mha/etc/app.cnf
/usr/local/bin/masterha_manager --conf=/data/mha/etc/app.cnf &
/usr/local/bin/masterha_check_status --conf=/data/mha/etc/app.cnf
/usr/local/bin/masterha_stop --conf=/data/mha/etc/app.cnf
nohup /usr/local/bin/masterha_manager --conf=/data/mha/etc/app.cnf --remove_dead_master_conf --ignore_last_failover > /data/mha/log/manager.log < /dev/null 2>&1 &
====
/usr/local/bin/save_binary_logs
show slave status;master_log_file, master_log_pos
mysqlbinlog save_binlog_xxx,
1.偿试读取故障的master日志
2.找binlog server上的日志差异(56引入binlog_server)
3.去slave上找relay-log
==把所有的slave数据补成一样的
/usr/local/bin/apply_diff_relay_logs


[root@mysql3 etc]# /usr/local/bin/masterha_check_ssh --conf=/data/mha/etc/app.cnf
Can't locate Class/Load.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/share/perl5/vendor_perl/Log/Dispatch.pm line 12.
BEGIN failed--compilation aborted at /usr/share/perl5/vendor_perl/Log/Dispatch.pm line 12.
Compilation failed in require at /usr/local/share/perl5/MHA/SSHCheck.pm line 29.
BEGIN failed--compilation aborted at /usr/local/share/perl5/MHA/SSHCheck.pm line 29.
Compilation failed in require at /usr/local/bin/masterha_check_ssh line 25.
BEGIN failed--compilation aborted at /usr/local/bin/masterha_check_ssh line 25.

[root@mysql3 soft]# /usr/local/bin/masterha_check_ssh --conf=/data/mha/etc/app.cnf
Can't locate Params/Validate.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/share/perl5/vendor_perl/Log/Dispatch.pm line 13.
BEGIN failed--compilation aborted at /usr/share/perl5/vendor_perl/Log/Dispatch.pm line 13.
Compilation failed in require at /usr/local/share/perl5/MHA/SSHCheck.pm line 29.
BEGIN failed--compilation aborted at /usr/local/share/perl5/MHA/SSHCheck.pm line 29.
Compilation failed in require at /usr/local/bin/masterha_check_ssh line 25.
BEGIN failed--compilation aborted at /usr/local/bin/masterha_check_ssh line 25.

[root@mysql3 soft]# /usr/local/bin/masterha_check_ssh --conf=/data/mha/etc/app.cnf
Thu Jul 12 22:02:59 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
/data/mha/etc/app.cnf:
at /usr/local/share/perl5/MHA/SSHCheck.pm line 148.
==详细检查,是配置文件中的[server3]少了]。。。。。。
==ssh检查成功
[root@mysql3 etc]# /usr/local/bin/masterha_check_ssh --conf=/data/mha/etc/app.cnf
Thu Jul 12 22:15:11 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Jul 12 22:15:11 2018 - [info] Reading application default configurations from /data/mha/etc/app.cnf..
Thu Jul 12 22:15:11 2018 - [info] Reading server configurations from /data/mha/etc/app.cnf..
Thu Jul 12 22:15:11 2018 - [info] Starting SSH connection tests..
Thu Jul 12 22:15:12 2018 - [debug]
Thu Jul 12 22:15:13 2018 - [info] All SSH connection tests passed successfully.

==检查复制状态,有提示slave [warning] relay_log_purge=0 is not set on slave *
[root@mysql3 etc]# /usr/local/bin/masterha_check_repl --conf=/data/mha/etc/app.cnf
Thu Jul 12 22:17:53 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Jul 12 22:17:53 2018 - [info] Reading application default configurations from /data/mha/etc/app.cnf..
Thu Jul 12 22:17:53 2018 - [info] Reading server configurations from /data/mha/etc/app.cnf..
Thu Jul 12 22:17:53 2018 - [info] MHA::MasterMonitor version 0.56.
Thu Jul 12 22:17:53 2018 - [info] Dead Servers:
Thu Jul 12 22:17:53 2018 - [info] Alive Servers:
* (current master)
+--*
+--*

2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

MySQL Replication Health is OK.

==检查mha状态
[root@mysql3 etc]# /usr/local/bin/masterha_check_status --conf=/data/mha/etc/app.cnf
app is stopped(2:NOT_RUNNING). #表示未启动mha


#启动mha manager
--remove_dead_master_conf 当mha发生切换以后,老的主库的ip将会从配置文件中移除
--ignore_last_failover,在缺省情况下,如果MHA检测到连续发生宕机,且两次宕机间隔不足8小时的话,则不会进行Failover,之所以这样限制是为了避免ping-pong效应。
默认情况,mha发生切换会产生app.failover.complete文件,下次再切换的时候如果发现目录下还有该文件,将不允许切换
,除非在第一次切换后手动rm 该文件,出于方便,每次启动加该参数
#nohup /usr/local/bin/masterha_manager --conf=/data/mha/etc/app.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mha/log/manager.log 2>&1 &
[root@mysql3 etc]# nohup /usr/local/bin/masterha_manager --conf=/data/mha/etc/app.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mha/log/manager.log 2>&1 &
[1] 14550
[root@mysql3 etc]# /usr/local/bin/masterha_check_status --conf=/data/mha/etc/app.cnf

#master 增加虚拟ip(采用脚本管理vip)
在用虚拟IP的时候,需要在开启MHA程序之前要把虚拟IP先设置到主上去,否则MHA是不会自己的去设置VIP,第一次设置VIP之后,后续脚本的故障转移等功能会自动的对VIP进行切换。
设置:
ifconfig eth0:1 192.168.100.xxx
移除:
ifconfig eth0:1 down
备注:用ifconfig命令配置的网卡信息,在网卡重启后机器重启后,配置就不存在。
mha第一次启动的时候,需要手动加上vip
[root@mysql1 ~]# /sbin/ifconfig ens160:1 */24
[root@mysql1 ~]# ip add

#关闭mha
#/usr/local/bin/masterha_stop --conf=/data/mha/etc/app.cnf

use test;
insert into t1 values(2,'BBBBBB');
insert into t1 values(3,'444');
insert into t1 values(4,'455');
insert into t1 values(5,'75');

select * from test.t1;

insert into t1 values(12,'1BBBBBB');
insert into t1 values(13,'1444');
insert into t1 values(14,'1455');
insert into t1 values(15,'175');

select * from test.t1;

select @@server_id;

5故障切换

http://www.cnblogs.com/gomysql/p/3675429.html
http://www.cnblogs.com/zhoujinyi/p/3808673.html
https://blog.csdn.net/wjhhjl/article/details/53705157

--slave,两台slave服务器设置read_only(从库对外提供读服务,只所以没有写进配置文件,是因为随时slave会提升为master)
mysql -u* -p*** -h127.0.0.1 -P3306 -e'set global relay_log_purge=OFF;'
mysql -u* -p*** -h127.0.0.1 -P3306 -e'set global read_only=1;'
注意
MHA在发生切换的过程中,从库的恢复过程中依赖于relay log的相关信息,所以这里要将relay log的自动清除设置为OFF
采用手动清除relay log的方式。在默认情况下,从服务器上的中继日志会在SQL线程执行完毕后被自动删除。
但是在MHA环境中,这些中继日志在恢复其他从服务器时可能会被用到,因此需要禁用中继日志的自动删除功能。

MHA节点中包含了pure_relay_logs命令工具
#cat purge_relay_log.sh
#!/bin/bash
user=*
passwd=***
port=3306
log_dir='/data/mha/log/'
work_dir='/data/mha/'
purge='/usr/local/bin/purge_relay_logs'

if [ ! -d $log_dir ]
then
mkdir $log_dir -p
fi

$purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1

定时任务
#crontab -l
0 4 * * * /bin/bash /root/purge_relay_log.sh
手动清除
# /usr/local/bin/purge_relay_logs --user=system --password=****--port=3306 --disable_relay_log_purge --workdir=/data/mha/

5.1 自动Failover

必须先启动MHA Manager,否则无法自动切换,当然手动切换不需要开启MHA Manager监控
一旦发生切换管理进程(Manager)将会退出,app.cnf配置文件将删除server1的配置,无法进行再次测试,需将故障数据库解决掉之后,重新change加入到MHA环境中来,
并且要保证app1.failover.complete不存在或则加上--ignore_last_failover参数忽略,才能再次开启管理进程。

# nohup /usr/local/bin/masterha_manager --conf=/data/mha/etc/app.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mha/log/manager.log 2>&1 &
# /usr/local/bin/masterha_check_status --conf=/data/mha/etc/app.cnf
[root@mysql3 ~]# /usr/local/bin/masterha_secondary_check -s * -s * --user=root --master_host=* --master_ip=* --master_port=3306
[root@mysql3 sysbench]# /usr/local/bin/masterha_check_status --conf=/data/mha/etc/app.cnf

use test;
insert into t2 values(2,'BBBBBB');
insert into t2 values(3,'444');
insert into t2 values(4,'455');
insert into t2 values(5,'75');

select * from test.t2;
commit;
[mysql@mysql2 scripts]$ mysql -usystem -p*** -h127.0.0.1 -P3306 -e'select * from test.t2;'

[mysql@mysql1 scripts]$ ./mysql_db_shutdown.sh

[root@mysql3 etc]# tail -f -n 100 /data/mha/log/manager.log

Master failover to *) completed successfully.

[root@mysql3 etc]# /usr/local/bin/masterha_check_repl --conf=/data/mha/etc/app.cnf

MySQL Replication Health is OK.

[root@mysql3 sysbench]# /usr/local/bin/masterha_check_status --conf=/data/mha/etc/app.cnf
app is stopped(2:NOT_RUNNING).
[1]+ Done nohup /usr/local/bin/masterha_manager --conf=/data/mha/etc/app.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mha/log/manager.log 2>&1

从上面的输出可以看出整个MHA的切换过程,共包括以下的步骤:
1.配置文件检查阶段,这个阶段会检查整个集群配置文件配置
2.宕机的master处理,这个阶段包括虚拟ip摘除操作,主机关机操作
3.复制dead maste和最新slave相差的relay log,并保存到MHA Manger具体的目录下
4.识别含有最新更新的slave
5.应用从master保存的二进制日志事件(binlog events)
6.提升一个slave为新的master进行复制
7.使其他的slave连接新的master进行复制
8 自动切换后,manager会删除app.cnf文件中的[server1]配置
9 old的master加入主从后,进行chk检查,还是发现不了new slave,因为配置文件里面已经删除掉

[root@mysql3 etc]# cat /data/mha/log/manager.log |grep -i vip
[root@mysql3 etc]# ssh mysql2 "ifconfig |grep 19.68 -B1"
#192.168.19.68 插入并查看数据
insert into t2 values(12,'1BBBBBB');
insert into t2 values(13,'1444');
insert into t2 values(14,'1455');
insert into t2 values(15,'175');

删除manager上的app.failover.complete
[root@mysql3 app]# rm /data/mha/etc/app/app.failover.complete

启动old的master,并登陆查看
[mysql@mysql1 scripts]$ ./mysql_db_startup.sh
[mysql@mysql1 scripts]$ ./mysqlplus.sh
(system@127.0.0.1:3306) [(none)]> show master status;
+------------------+----------+--------------+------------------+------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+------------------------------------------+
| mysql-bin.000002 | 191 | | | cab5ec92-85ab-11e8-852c-005056bf5092:1-6 |
+------------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.00 sec)

(system@127.0.0.1:3306) [(none)]> select *from test.t2; #还是宕机前的数据
+------+--------+
| id | name |
+------+--------+
| 1 | AAAAA |
| 2 | BBBBBB |
| 3 | 444 |
| 4 | 455 |
| 5 | 75 |
+------+--------+
查看manager的log文件,找到change的语句
[root@mysql3 etc]# cat /data/mha/log/manager.log |grep -i change
Mon Jul 16 05:37:15 2018 - [info] Executed CHANGE MASTER.

在slave连接master
(system@127.0.0.1:3306) [(none)]> CHANGE MASTER TO MASTER_HOST='*', ;
Query OK, 0 rows affected, 2 warnings (0.02 sec)
(system@127.0.0.1:3306) [(none)]> start slave;
Query OK, 0 rows affected (0.00 sec)
(system@127.0.0.1:3306) [(none)]> show slave statusG;

(system@127.0.0.1:3306) [(none)]> select *from test.t2;
+------+---------+
| id | name |
+------+---------+
| 1 | AAAAA |
| 2 | BBBBBB |
| 3 | 444 |
| 4 | 455 |
| 5 | 75 |
| 12 | 1BBBBBB |
| 13 | 1444 |
| 14 | 1455 |
| 15 | 175 |
+------+---------+
9 rows in set (0.00 sec)
已经将old master作为新slave加入mha,修改app.cnf配置文件,加入相应的server
[root@mysql3 ~]# vim /data/mha/etc/app.cnf
[root@mysql3 ~]# /usr/local/bin/masterha_check_repl --conf=/data/mha/etc/app.cnf

[root@mysql3 etc]# /usr/local/bin/masterha_check_repl --conf=/data/mha/etc/app.cnf
启动管理节点
# nohup /usr/local/bin/masterha_manager --conf=/data/mha/etc/app.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mha/log/manager.log 2>&1 &
[root@mysql3 ~]# /usr/local/bin/masterha_check_status --conf=/data/mha/etc/app.cnf

# /usr/local/bin/masterha_stop --conf=/data/mha/etc/app.cnf
--3台服务器停电,重启后,74作为slave报错,
Last_SQL_Errno: 1050
Last_SQL_Error: Error 'Table 't2' already exists' on query. Default database: 'test'. Query: 'create table t2(id int,name varchar(10))'
(system@127.0.0.1:3306) [test]> reset master;
(system@127.0.0.1:3306) [test]> show master status;
(system@127.0.0.1:3306) [test]> set @@GLOBAL.GTID_PURGED='cab5ec92-85ab-11e8-852c-005056bf5092:1-6,d4e95eb8-85ae-11e8-8540-000c29450242:1-17';
(system@127.0.0.1:3306) [test]> start slave;
(system@127.0.0.1:3306) [test]> show slave statusG;


5.2 手动Failover
masterha_master_switch工具进行主从切换,在手动切换的同时需要保证没有启用MHA自动切换功能
MHA Manager必须没有运行
手动failover,这种场景意味着在业务上没有启用MHA自动切换功能,当主服务器故障时,人工手动调用MHA来进行故障切换操作,具体命令如下
注意:如果,MHA manager检测到没有dead的server,将报错,并结束failover
# /usr/local/bin/masterha_stop --conf=/data/mha/etc/app.cnf
# /usr/local/bin/masterha_master_switch --master_state=dead --conf=/data/mha/etc/app.cnf --dead_master_host=* --dead_master_port=3306 --new_master_host=* --new_master_port=3306 --ignore_last_failover
--dead_master_ip=<dead_master_ip> is not set. Using *.
Mon Jul 16 22:11:01 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jul 16 22:11:01 2018 - [info] Reading application default configurations from /data/mha/etc/app.cnf..
Mon Jul 16 22:11:01 2018 - [info] Reading server configurations from /data/mha/etc/app.cnf..
Mon Jul 16 22:11:01 2018 - [info] MHA::MasterFailover version 0.56.
Mon Jul 16 22:11:01 2018 - [info] Starting master failover.
Mon Jul 16 22:11:01 2018 - [info]
Mon Jul 16 22:11:01 2018 - [info] * Phase 1: Configuration Check Phase..
Mon Jul 16 22:11:01 2018 - [info]
Mon Jul 16 22:11:01 2018 - [info] Dead Servers:
Mon Jul 16 22:11:01 2018 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln183] None of server is dead. Stop failover.
Mon Jul 16 22:11:01 2018 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln178] Got ERROR: at /usr/local/bin/masterha_master_switch line 53
因为这里master没有问题,所以手动failover切换会失败,有的mha没有开启自动切换,手动切换(当主服务故障时)

5.3 在线进行切换

MHA在线切换的大概过程:
1.检测复制设置和确定当前主服务器
2.确定新的主服务器
3.阻塞写入到当前主服务器
4.等待所有从服务器赶上复制
5.授予写入到新的主服务器
6.重新设置从服务器

注意,在线切换的时候应用架构需要考虑以下两个问题:
1.自动识别master和slave的问题(master的机器可能会切换),如果采用了vip的方式,基本可以解决这个问题。
2.负载均衡的问题(可以定义大概的读写比例,每台机器可承担的负载比例,当有机器离开集群时,需要考虑这个问题)
为了保证数据完全一致性,在最快的时间内完成切换,MHA的在线切换必须满足以下条件才会切换成功,否则会切换失败。

1.所有slave的IO线程都在运行
2.所有slave的SQL线程都在运行
3.所有的show slave status的输出中Seconds_Behind_Master参数小于或者等于running_updates_limit秒,如果在切换过程中不指定running_updates_limit,那么默认情况下running_updates_limit为1秒。
4.在master端,通过show processlist输出,没有一个更新花费的时间大于running_updates_limit秒。
在线切换步骤如下:

首先,停掉MHA监控:
# /usr/local/bin/masterha_stop --conf=/data/mha/etc/app.cnf
# /usr/local/bin/masterha_master_switch --conf=/data/mha/etc/app.cnf --master_state=alive --new_master_host=* --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000

master
[system@3306][(none)]> set global event_scheduler=off;
[system@3306][(none)]> Select @@event_scheduler;
+-------------------+
| @@event_scheduler |
+-------------------+
| OFF |
+-------------------+

--orig_master_is_new_slave 切换时加上此参数是将原 master 变为 slave 节点,如果不加此参数,原来的 master 将不启动
--running_updates_limit=10000,故障切换时,候选master 如果有延迟的话, mha 切换不能成功,加上此参数表示延迟在此时间范围内都可切换(单位为s),但是切换的时间长短是由recover 时relay 日志的大小决定
注意:由于在线进行切换需要调用到master_ip_online_change这个脚本

修复宕机的Master
通常情况下自动切换以后,原master可能已经废弃掉,待原master主机修复后,如果数据完整的情况下,可能想把原来master重新作为新主库的slave
,这时我们可以借助当时自动切换时刻的MHA日志来完成对原master的修复。下面是提取相关日志的命令:
# grep -i "All other slaves" /data/mha/log/manager.log

=========

MHA 参数列表


Local : 指每一个配置块内部。 Local功能的参数需要放置在[server_xxx] 块下面
App : 参数作用于master/slave, 这些参数需要配置在[server_default]块的下面
Global : 作用于master/slave, Global级别的参数用于管理多组master/slave结构,可以统一化管理一些参数
candidate_master
你可能对于同一组slave中有着不同的规划,有的其望在Master故障时,提升为新的Master(如: Raid1的slave比Raid0的slave更适合做Master)
这个参数的作用是当设计candidate_master = 1时,这个服务器有较高的优先级提升为新的master(还要具备: 开启binlog, 复制没有延迟)。
所以当设置了candidate_master = 1的机器在master故障时必然成为新的master. 但这是很有用的设置优先级的一个参数。
no_master 当设置了no_master = 1的服务器,这个服务器永远不会提升为新的master. 这个参数据对于永远不期望成为master的机器很有用。
ignore_fail 在默认情况下,MHA manager不会在slave存在故障的情况下(已经不能通过SSH/MySQL进行连接,或是SQL Thread已经停止出错,其它原因)的情况下进行Master的故障切换
(当不存在slave时MHA manager也会退出)。 但有些情况下你期望在slave存在问题的情况下,也进行故障切换。 所以当设置了ignore_fail = 1时,MHA会在所有的机器有问题的时间也会进行故障切换。 默认是0.
skip_reset_slave (从MHA 0.56开始支持) Master故障切换后新的master上执行RESET SLAVE(ALL)
disable_log_bin 当设置了这个参数,在slave应用差异的relay log时不会产生二进制日志。 内部实现通过mysqlbinlog的disable-log-bin实现。
remote_workdir MHA node上工作目录的全路径名。如果不存在,MHA node会自动创建,如果不允许创建,MHA Node自动异常退出。
需要注意MHA manager 或是MHA node上需要检查空间是否可用,所以需要注意检查这个。 一般默认, remote_workdir是的”/var/tmp”
master_binlog_dir master上用于存储binary日志的全路径。这个参数用于当master上mysql死掉后,通过ssh连到mysql服务器上,找到需要binary日志事件。这个参数可以帮助用于无法找到master进程死掉后binary日志存储位置。
log_level 设置MHA manager记录LOG的级别。 默认是info级别而且大多数情况也是比较适合。 同样可以设置为: debug/info/warning/error.
manager_workdir 用于指定mha manager产生相关状态文件全路径。 如果没设置 默认是/var/tmp
check_repl_delay
check_repl_filter 在默认下情况,当master和slave设置了不同了binary log/replication 过滤规则时,MHA直接报错不会进行监控和故障切换。 这些将会导致出现一些异想不到的错误”Table not exists”。
如果你100%确认不同的过滤规则不会导致恢复中的错误,设置check_repl_filter=0。 需要注意: 当使用了check_repl_filter = 0时,MHA不会检查过滤规则在应用差异的relay日志,所以有可能会出现
”Table not exists”的错误。当你设置了这个参数请小心处理。
latest_priority 在默认情况下,和Master最接近的slave(一个slave从Master上获得了最一个binlog事件)是最有优先权成为新的master。
multi_tier_slave 从MHA 0.52开始, 多层复制可以支持了。在默认情况下,不支持三层或是更多层的复制配置。 如果: host2从host1上复制,host3从host2上复制。 在默认配置的情况下不支持写host{1,2,3},
因为这是一个三层的复制,MHA Manager会停止报错。 当设置了multi_tier_slave, MHA Manager就不会在三层复制报错停止。 但是会忽略第三层的机器。也就是如果host1挂了,host2将会成为新的master,host3还是从host2上进行复制。
ping_interval 这个参数设置MHA Manager多长时间去ping一下master(执行一些SQL语句). 当失去和master三次偿试,MHA Manager会认为MySQL Master死掉了。也就是说,最大的故障切换时间是4次ping_interval的时间,默认是3秒。
ping_type (从MHA 0.53后开始支持) 在默认情况下, MHA manager和MySQL创建一个连接执行”select 1″(ping_type=select)用于检查master是否健康。
但有一些情况: 每次检测都连接/然后断开会比较好一点,这样对于tcp方面的错误感知更快一点。设置ping_type=CONNECT 就行了。从MHA 0.56后pint_type=INSERT也被添加。
secondary_check_script 一般来讲, 非常推荐使用更多网络上机器是不同路由策略上的机器来检查MySQL Master是否存活。 默认情况下,只有MHA Manager通过一个路由来
检查Master是否存活。这个也是不推荐的。MHA可以通过外部secondary_check_script配置的脚本来做到从多个路由策略下进行检查。
master_ip_failover_script
master_ip_online_changes_script
shutdown_script
report_script
init_conf_load_script 这个参数用于不想设置明文在配置文件中(如:密码相关)。 只用返回”name=value”这样的值。 这个可以用来复盖global配置中的一些值。一个例子脚本如下
#!/usr/bin/perl
print “password=$ROOT_PASS ”;
print “repl_password=$REPL_PASS ”;
init_conf_load_script No App/Global null report_script= /usr/local/custom_script/init_conf_loader

原文地址:https://www.cnblogs.com/yhq1314/p/10001694.html