怎样从数据库层面检測两表内容的一致性

一般来说呢。怎样检測两张表的内容是否一致，这种需求大多在从机上体现，以保证数据一致性。方法无非有两个，第一呢就是从数据库着手。第二呢就是从应用程序端着手。

我这里罗列了些怎样从数据库层面来解决此类问题的方法。
当然第一步就是检查记录数是否一致，否则不用想不论什么其它方法了。
这里我们用两张表t1_old,t1_new来演示。

表结构：
 CREATE TABLE t1_old (
  id int(11) NOT NULL,
  log_time timestamp DEFAULT NULL
) ;




 CREATE TABLE t1_new (
  id int(11) NOT NULL,
  log_time timestamp DEFAULT NULL
) ;


两表的记录数都为100条。
mysql> select count(*) from t1_old;
+----------+
| count(*) |
+----------+
|      100 |
+----------+
1 row in set (0.31 sec)


mysql> select count(*) from t1_new;
+----------+
| count(*) |
+----------+
|      100 |
+----------+
1 row in set (0.00 sec)

方法一：用加法然后去重。

因为Union 本身具备把上下两条连接的记录做唯一性排序，所以这样检測来的很easy。
mysql> select count(*) from (select * from t1_old union select * from t1_new) as T;
+----------+
| count(*) |
+----------+
|      100 |
+----------+
1 row in set (0.06 sec)
这里的记录数为100，初步证明两表内容一致。可是，这种方法有个BUG，在某些情形下不能简单表示结果集一致。

比方：


mysql> create table t1_old1 (id int);
Query OK, 0 rows affected (0.27 sec)


mysql> create table t1_new1(id int);
Query OK, 0 rows affected (0.09 sec)


mysql> insert into t1_old1 values (1),(2),(3),(5);
Query OK, 4 rows affected (0.15 sec)
Records: 4  Duplicates: 0  Warnings: 0


mysql> insert into t1_new1 values (2),(2),(3),(5);    
Query OK, 4 rows affected (0.02 sec)
Records: 4  Duplicates: 0  Warnings: 0


mysql> select * from t1_old1;
+------+
| id   |
+------+
|    1 |
|    2 |
|    3 |
|    5 |
+------+
4 rows in set (0.00 sec)


mysql> select * from t1_new1;
+------+
| id   |
+------+
|    2 |
|    2 |
|    3 |
|    5 |
+------+
4 rows in set (0.00 sec)


mysql> select count(*) from (select * from t1_old1 union select * from t1_new1) as T;
+----------+
| count(*) |
+----------+
|        4 |
+----------+
1 row in set (0.00 sec)


mysql> 
所以在这点上。这种方法等于是无效。

方法二：用减法来归零。

因为MySQL 没有提供减法操作符。这里我们换做PostgreSQL来检測。

t_girl=# select count(*) from (select * from t1_old except select * from t1_new) as T;
 count 
-------
     0
(1 row)


Time: 1.809 ms
这里检測出来结果是0。那么证明两表的内容一致。 那么我们能够针对第一种方法提到的第二种情况做检測:
t_girl=# select count(*) from (select * from t1_old1 except select * from t1_new1) as T;
 count 
-------
     1
(1 row)


Time: 9.837 ms
OK，这里检測出来结果不正确，那么就直接给出不一致的结论。

第三种：用全表JOIN，这个也是最烂的做法了，当然我这里指的是在表记录数超级多的情形下。

当然这点我也用PostgreSQL来演示
t_girl=# select count(*) from t1_old as a full outer join t1_new as b using (id,log_time) where a.id is null or b.id is null; 
 count 
-------
     0
(1 row)


Time: 5.002 ms
t_girl=# 
结果为0，证明内容一致。

第四种：用checksum校验。

比方在MySQL 里面。假设两张表的checksum值一致，那么内容也就一致。



mysql> checksum table t1_old;
+---------------+----------+
| Table         | Checksum |
+---------------+----------+
| t_girl.t1_old | 60614552 |
+---------------+----------+
1 row in set (0.00 sec)


mysql> checksum table t1_new;
+---------------+----------+
| Table         | Checksum |
+---------------+----------+
| t_girl.t1_new | 60614552 |
+---------------+----------+
1 row in set (0.00 sec)


可是这种方法也仅仅局限于两表结构一摸一样。 比方，我改动下表t1_old的字段类型，那么checksum的值也就不一样了。



mysql> alter table t1_old modify id bigint;
Query OK, 100 rows affected (0.23 sec)
Records: 100  Duplicates: 0  Warnings: 0


mysql> checksum table t1_old;
+---------------+------------+
| Table         | Checksum   |
+---------------+------------+
| t_girl.t1_old | 3211623989 |
+---------------+------------+
1 row in set (0.00 sec)


mysql> checksum table t1_new;
+---------------+----------+
| Table         | Checksum |
+---------------+----------+
| t_girl.t1_new | 60614552 |
+---------------+----------+
1 row in set (0.00 sec)

所以从上面几种数据库提供的方法来看，用减法来归零相对来说比較可靠，其它的方法比較适合在特定的情形下来检測。