MySQL master 宕机导致slave数据比master多的case

先说环境吧：

Server version:         5.6.16-enterprise-commercial-advanced-log MySQL Enterprise Server - Advanced Edition (Commercial)

mysql> show variables like '%innodb_flush_log_at_trx_commit%';
+--------------------------------+-------+
| Variable_name                  | Value |
+--------------------------------+-------+
| innodb_flush_log_at_trx_commit | 2     |
+--------------------------------+-------+
1 row in set (0.00 sec)

mysql> show variables like '%sync_binlog%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| sync_binlog   | 0     |
+---------------+-------+
show variables like '%innodb_support_xa%';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| innodb_support_xa | ON    |
+-------------------+-------+
-- Master上的error log:
InnoDB: Progress in percent: 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99

InnoDB: Apply batch completed

InnoDB: Last MySQL binlog file position 0 130483656, file name mysql-bin.002977
-- 130483656 ：InnoDB 根据innodb_support_xa  sync redo、binlog同步后的pos
-- 手册中的说法： at restart after a crash, after doing a rollback of transactions, the MySQL server removes rolled back InnoDB transactions from the binary log
2015-03-21 08:07:24 17646 [ERROR] Error in Log_event::read_log_event(): 'read error', data_len: 108, event_type: 31

2015-03-21 08:07:24 17646 [Note] Starting crash recovery...

2015-03-21 08:07:24 17646 [Note] Crash recovery finished.

2015-03-21 08:07:24 17646 [Note] Crashed binlog file /paic/mylog/3308/mysql-bin.002977 size is 130408448, but recovered up to 130407973. Binlog trimmed to 130407973 bytes

-- 从这个日志上看，当时主机启动后binary log文件损坏只从130408448 恢复到了130407973.

-- Slave上的error log:

2015-03-21 04:41:21 22181 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.002977', position 130484169

-- Slave上读到的binlog pos为130484169

可以看出slave上读取到的binlog pos(130484169)比master binlog recovery 后的pos（130407973）要大，甚至比recovery前的binlog（130408448）也要大！

slave收到了binlog并不会回滚，并在slave sql thread执行写入.

这样 master 跟redo 一致的真实pos为：130483656（binlog最终记录的pos为130407973）,slave应用到pos:130484169。导致slave数据比master要新！！！

注：master上binlog 从130407973到130408448再到130483656的数据已经正常入库，没有丢失。而从130483656 到130484169的数据已经被回滚，或者说丢失了！

同时可以看出master的 Binlog Dump 线程推送binlog内容时，没有刷盘已经推给slave了。可见MySQL 的replication 在数据一致性方面确实难以实现。

根据CAP理论：主从同步能达到A、P 但是C无法满足，其实严格意义上来讲A也没有达到！

为了尽可能保证主从数据的一致性，可以

If binary logging is enabled, set sync_binlog=1.
Always set innodb_flush_log_at_trx_commit=1.

sync_binlog =0跟100的差别:设置为100时，100个事务会强制刷盘；但每次事务也有可能刷盘（这一点跟1、0类似） ---由OS决定

手册参照：

http://dev.mysql.com/doc/refman/5.6/en/binary-log.html

http://dev.mysql.com/doc/refman/5.6/en/replication-options-binary-log.html#sysvar_sync_binlog

http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit

关于InnoDB 写log 策略可参照 http://csrd.aliapp.com/?p=870