使用pt-table-checksum校验MySQL主从复制【转】

pt-table-checksum是一个基于MySQL数据库主从架构在线数据一致性校验工具。其工作原理在主库上运行, 通过对同步的表在主从段执行checksum, 从而判断数据是否一致。在校验完毕时,该工具将列出与主库存在差异的对象结果。

一、主从不一致的情形

    Master端使用了不确定的语句(如:CURRENT_USER(), UUID())
    不正确的故障转移(failover)流程
    误操作或直接在Slave进行DML操作
    持续的升级更新(Rolling upgrades)
    混合使用事务引擎和非事务引擎的表
    跳过了复制事件 (SET GLOBAL SQL_SLAVE_SKIP_COUNTER = N)
    使用临时表(Temporary Tables)
    复制过滤(Replication Filters)
    使用含LIMIT且没有order by的更新语句(update/delete with LIMIT clause without order by)

二、pt-table-checksum特性

    pt-table-checksum connects to the server you specify, and finds databases and tables that 
    match the filters you specify  (if any). It works one table at a time, so it does not accumulate
    large amounts of memory or do a lot of work before beginning to checksum. This makes it usable
    on very large servers. We have used it on servers with hundreds of thousands of databases and tables, 
    and trillions of rows. No matter how large the server is, pt-table-checksum works equally well.

    One reason it can work on very large tables is that it divides each table into chunks of rows, 
    and checksums each chunk with a single REPLACE..SELECT query. It varies the chunk size to make 
    the checksum queries run in the desired amount of time. The goal of chunking the tables, instead of 
    doing each table with a single big query, is to ensure that checksums are unintrusive and don’t cause too
    much replication lag or load on the server. That’s why the target time for each chunk is 0.5 seconds by default.

    The tool keeps track of how quickly the server is able to execute the queries, and adjusts the chunks 
    as it learns more about the server’s performance. It uses an exponentially decaying weighted average 
    to keep the chunk size stable, yet remain responsive if the server’s performance changes during checksumming 
    for any reason. This means that the tool will quickly throttle itself if your server becomes heavily loaded during 
    a trafficc spike or a background task, for example.

    After pt-table-checksum finishes checksumming all of the chunks in a table, it pauses and waits for all 
    detected replicas to finish executing the checksum queries. Once that is finished, it checks all of the replicas to 
    see if they have the same data as the master, and then prints a line of output with the results. 

四、参数描述

–nocheck-replication-filters
  不检查复制过滤器,建议启用。后面可以用–databases来指定需要检查的数据库。
–no-check-binlog-format
  不检查复制的binlog模式,要是binlog模式是ROW,则会报错。
–replicate-check-only
  只显示不同步的信息。
–replicate=
  把checksum的信息写入到指定表中,建议直接写到被检查的数据库当中。
–databases=
  指定需要被检查的数据库,多个则用逗号隔开。
–tables=
  指定需要被检查的表,多个用逗号隔开
  h=127.0.0.1 :Master的地址
  u=root :用户名
  p=123456 :密码
  P=3306 :端口

五、常见问题

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -d mysql 
> --nocheck-replication-filters --replicate=test.checksums
Replica vdbsrv4 has binlog_format MIXED which could cause pt-table-checksum to break replication. 
Please read "Replicas using row-based replication" in the LIMITATIONS section of the tool's documentation.
  If you understand the risks, specify --no-check-binlog-format to disable this check.
上面描述的是关于使用mixed日志格式时的问题  

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -d mysql 
> --nocheck-replication-filters --no-check-binlog-format
DBD::mysql::db do failed: Access denied for user 'checksums'@'192.168.1.%' to database 'percona' 
[for Statement "CREATE DATABASE IF NOT EXISTS `percona` /* pt-table-checksum */"] 
at /usr/bin/pt-table-checksum line 10743.
07-29T08:42:03 --replicate database percona does not exist and it cannot be created automatically. 
You need to create the database.

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -dtest -tt 
> --nocheck-replication-filters --no-check-binlog-format  --replicate=test.checksum
Cannot connect to P=3306,h=vdbsrv4,p=...,u=checksums
Diffs cannot be detected because no slaves were found.  
Please read the --recursion-method documentation for information.
            TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE
08-06T10:03:10      0      0        3      1      0  0.023 test.t

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -dtest -tt 
> --nocheck-replication-filters --no-check-binlog-format 
> --replicate=test.checksum --recursion-method=hosts
Cannot connect to P=3306,h=,p=...,u=checksums
Cannot connect to P=3307,h=,p=...,u=checksums
Diffs cannot be detected because no slaves were found.  
Please read the --recursion-method documentation for information.
            TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE
08-06T16:02:27      0      0        3      1      0  0.016 test.t

master@localhost[(none)]> show slave hosts;
+-----------+------+------+-----------+--------------------------------------+
| Server_id | Host | Port | Master_id | Slave_UUID                          |
+-----------+------+------+-----------+--------------------------------------+
|        1 |      | 3306 |      1002 | f2824060-e2cb-11e4-8f18-000c2926f457 |
|        11 |      | 3307 |      1002 | 69fc46b6-3c06-11e5-94f0-000c29a05f26 |
+-----------+------+------+-----------+--------------------------------------+

-- 增加参数report_host后重启从库
[root@vdbsrv4 ~]# grep report_host /etc/my.cnf
report_host='192.168.1.12'

master@localhost[(none)]> show slave hosts;
+-----------+--------------+------+-----------+--------------------------------------+
| Server_id | Host        | Port | Master_id | Slave_UUID                          |
+-----------+--------------+------+-----------+--------------------------------------+
|        11 | 192.168.1.12 | 3307 |      1002 | 69fc46b6-3c06-11e5-94f0-000c29a05f26 |
|        1 | 192.168.1.12 | 3306 |      1002 | f2824060-e2cb-11e4-8f18-000c2926f457 |
+-----------+--------------+------+-----------+--------------------------------------+

转自

(2条消息) 使用pt-table-checksum校验MySQL主从复制_乐沙弥的世界-CSDN博客 https://blog.csdn.net/leshami/article/details/78377444

原文地址:https://www.cnblogs.com/paul8339/p/13932562.html