mysql服务异常宕机，重启后主从报错问题处理Last_Error: Relay log read failure: Could not parse relay log event entry.

问题场景

mysql服务异常宕机，经检查是由于网络存储数据盘防火墙升级导致无法写入变为只读模式

重启MySQL服务 systemctl start mysqld@3306

进入mysql查看主从信息 show slave statusG

发现主从报错

Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.

分析问题

1.查看error.log日志查看问题，发现以下错误日志

2020-12-07T23:04:43.017931Z 1653 [Note] Slave SQL thread for channel '' initialized, starting replication in log 'mysql-bin.001000' at position 19094834, relay log '/data1/mysqldata3306/relaylog/mysql-relay-bin.000411' position: 19095047
2020-12-07T23:04:43.018017Z 1653 [ERROR] Error in Log_event::read_log_event(): 'Event too small', data_len: 0, event_type: 0
2020-12-07T23:04:43.018024Z 1653 [ERROR] Error reading relay log event for channel '': slave SQL thread aborted because of I/O error
2020-12-07T23:04:43.018038Z 1653 [ERROR] Slave SQL for channel '': Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 1594
2020-12-07T23:04:43.018043Z 1653 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.001000' position 19094834.

2.从上面的报错信息中我们可以看到应该是主库的binlog日志或者从库的relay日志损坏，导致从库读取日志的时候时候，复制线程报错。

报错中也给出了检查主库binlog或者从库relaylog是否损坏的方法，即通过mysqlbinlog解析binlog或是relay log看能否成功。

检查发现mysqlbinlog能够正常解析主库binlog，但是在解析从库的relaylog时报如下错误：

mysqlbinlog --base64-output=decode-rows -vvv 3306-relay-bin.000462 > test.sql
ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 402, event_type: 2
ERROR: Could not read entry at offset 976703392: Error in log format or read error.
WARNING: The range of printed events ends with an Intvar_event, Rand_event or User_var_event with no matching Query_log_event. This might be because the last statement was not fully written to the log, or because you are using a --stop-position or --stop-datetime that refers to an event in the middle of a statement. The event(s) from the partial statement have not been written to output.

3.到此我们可以确定是因为服务器异常宕机导致从库的relaylog损坏，从而导致从库复制线程报错。（一般突然宕机情况下都会是这种问题导致，主库binlog日志不会无故损坏）

解决方案

relaylog损坏，那么我们可以从复制停止的位置开始重新读取主库的binlog日志，可以通过从库errorlog中的报错来确定位置，（We stopped at log 'mysql-bin.001000' position 19094834.）

处理步骤如下：

mysql> stop slave ;
mysql>
CHANGE MASTER TO MASTER_HOST='192.168.10.1',
MASTER_PORT=3306,
MASTER_USER='mysql',
MASTER_PASSWORD='mysql123456',
MASTER_LOG_FILE='mysql-bin.001000',
MASTER_LOG_POS=19094834;
mysql> start slave;

或者

stop slave;
change master to master_log_file='mysql-bin.001000', master_log_pos=19094834;
start slave;

即问题解决。

在重新检查主从信息
mysql>show slave statusG

如果主从还报错，Last_Error: 报错为主键冲突

可以进行错误跳过处理即可恢复主从。
STOP SLAVE;
SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
START SLAVE