gh-ost2 实现原理

2019/8/8

TOC


1.过程解析

1. 检查校验环境

  • 测试db是否可连通,并且验证database是否存在
  • 确认连接实例是否正确
  • 权限验证 show grants for current_user()
  • binlog验证,包括row格式验证和修改binlog格式后的重启replicate
  • 原表存储引擎,外键,触发器检查,行数预估等

2. 创建binlog streamer连接到主库或者从库,添加binlog的监听

3. 创建log表 _xxx_ghc 和ghost表 _xxx_gho并修改ghost表结构到最新

  • create table _xxx_ghc # 日志表
  • create table _xxx_gho # 临时表
  • alter table _xxx_gho .. # 表变更

4. row-copy & apply-binlog

  • 计算最大最小值
select `id` from darren`.`t4` order by id` asc limit 1;
select `id` from darren`.`t4` order by id` desc limit 1;
  • 计算trunk
# 第一个trunk 到倒数第二个
select `id` from `darren`.`t4` 
where `id` >= _binary'1' and `id` <= _binary'58594' order by `id` asc limit 1 offset 999

#最后一个chunk如果不足1000,那么上面sql查询为空,这时运行:
select `id` from (  
select `id` from `darren`.`t4` 
where `id` > _binary'58000' and `id` <= _binary'58594' order by `id` asc limit 1000 
) select_osc_chunk  order by `id` desc limit 1;
  • 循环插入数据:
insert ignore into `darren`.`_t4_gho` (`id`, `name`, `c1`) 
(select `id`, `name`, `c1` from `darren`.`t4` force index (`PRIMARY`) 
where `id` >= _binary'1' and `id` <= _binary'1000' lock in share mode )
  • row-copy & apply-binlog 映射操作
操作原表新表
row-copyselect insert ignore into
appy-binloginsertreplace into
updateupdate
deletedelete
  • row-copy & apply binlog 操作顺序数据冲突分析
    数据迁移过程,涉及三个操作:
    1. A:对原表进行rowcopy;
    2. B:应用程序的DML;
    3. C:应用binlog到新表,因为DML操作才会记录binglog,C一定会在B后

  • binlog同步数据何时结束?
    copy完数据向_xxx_ghc写入status:
    AllEventsUpToLockProcessed:1533533052229905040,当binlogsyncer过滤到该值表示所有event都已应用

5. cut-over 阶段

S1: C10:
create table _t1_del like t1;   # 创建magic表_xxx_del,目的为了防止过快的进行rename操作和意外情况rename
lock table t1 write,_t1_del write ; # 对源表和magic表_xxx_del加write锁
S2:C11-C19
mysql> select * from t1;  # C11...C19: 新的请求进来,关于原表的请求被blocked
S3:C20
rename table `t1` to `_t1_del` ,`_t1_gho` to `t1`;   #被阻塞,RENAME timeout as 3 seconds
S4: show processlist;  # 检查是否有blocked的rename请求

S5: C10:
drop table _t1_del;  #删除magic表
unlock tables;
S6:C20
mysql> rename table `t1` to `_t1_del` ,`_t1_gho` to `t1`;              
Query OK, 0 rows affected (4 min 4.36 sec)

mysql> select * from t1;       
Empty set (5 min 35.78 sec)

不同阶段失败后如何处理:

  1. S1 失败,退出程序,比如建表成功,加锁失败,退出程序,未加锁
  2. S3前,rename请求来的时候,会话C10死掉,lock会自动释放,同时因为_xxx_del的存在rename也会失败,所有请求恢复正常
  3. S3后, rename被blocked的时候,会话C10死掉,lock会自动释放,同样因为_xxx_del的存在,rename会失败,所有请求恢复正常
  4. C20死掉,gh-ost会捕获不到rename,会话C10继续运行,释放lock,所有请求恢复正常

6. 收尾

  1. 关闭binlogsyncer连接
  2. 删除源表和_t4_ghc表

2. 日志

 [root@iZbp1aihmppor0oe9epotuZ ~]# gh-ost 
> --ok-to-drop-table 
> --initially-drop-ghost-table 
> --initially-drop-socket-file 
> --host="rm-host1.mysql.rds.aliyuncs.com" 
> --port=3306 
> --user="" 
> --password=''
> --database="dba_test" 
> --table="t1" 
> --verbose 
> --alter="add column cc1 varchar(256) default '';" 
> --allow-on-master 
> --assume-rbr='true' 
> --assume-master-host='rm-host1.mysql.rds.aliyuncs.com:3306' 
> --aliyun-rds --execute
2019-06-27 09:32:07 INFO starting gh-ost 1.0.48
2019-06-27 09:32:07 INFO Migrating `dba_test`.`t1`
2019-06-27 09:32:07 INFO connection validated on rm-host1.mysql.rds.aliyuncs.com:3306
2019-06-27 09:32:07 INFO User has REPLICATION CLIENT, REPLICATION SLAVE privileges, and has ALL privileges on `dba_test`.*
2019-06-27 09:32:07 INFO binary logs validated on rm-host1.mysql.rds.aliyuncs.com:3306
2019-06-27 09:32:07 INFO Inspector initiated on rm-host1.mysql.rds.aliyuncs.com:3306, version 5.7.25-log
2019-06-27 09:32:07 INFO Table found. Engine=InnoDB
2019-06-27 09:32:07 INFO Estimated number of rows via EXPLAIN: 1
2019-06-27 09:32:07 INFO Master forced to be rm-host1.mysql.rds.aliyuncs.com:3306
2019-06-27 09:32:07 INFO log_slave_updates validated on rm-host1.mysql.rds.aliyuncs.com:3306
2019-06-27 09:32:07 INFO connection validated on rm-host1.mysql.rds.aliyuncs.com:3306
2019-06-27 09:32:07 INFO Connecting binlog streamer at mysql-bin.001413:27777608
[2019/06/27 09:32:07] [info] binlogsyncer.go:133 create BinlogSyncer with config {99999 mysql rm-host1.mysql.rds.aliyuncs.com 3306 oper_super false false <nil> false UTC true 0 0s 0s 0 false}
[2019/06/27 09:32:07] [info] binlogsyncer.go:354 begin to sync binlog from position (mysql-bin.001413, 27777608)
[2019/06/27 09:32:07] [info] binlogsyncer.go:203 register slave for master server rm-host1.mysql.rds.aliyuncs.com:3306
[2019/06/27 09:32:07] [info] binlogsyncer.go:723 rotate to (mysql-bin.001413, 27777608)
2019-06-27 09:32:07 INFO rotate to next log from mysql-bin.001413:0 to mysql-bin.001413
2019-06-27 09:32:07 INFO connection validated on rm-host1.mysql.rds.aliyuncs.com:3306
2019-06-27 09:32:07 INFO connection validated on rm-host1.mysql.rds.aliyuncs.com:3306
2019-06-27 09:32:07 INFO will use time_zone='SYSTEM' on applier
2019-06-27 09:32:07 INFO Examining table structure on applier
2019-06-27 09:32:07 INFO Applier initiated on rm-host1.mysql.rds.aliyuncs.com:3306, version 5.7.25-log
2019-06-27 09:32:07 INFO Dropping table `dba_test`.`_t1_gho`
2019-06-27 09:32:07 INFO Table dropped
2019-06-27 09:32:07 INFO Dropping table `dba_test`.`_t1_ghc`
2019-06-27 09:32:07 INFO Table dropped
2019-06-27 09:32:07 INFO Creating changelog table `dba_test`.`_t1_ghc`
2019-06-27 09:32:07 INFO Changelog table created
2019-06-27 09:32:07 INFO Creating ghost table `dba_test`.`_t1_gho`
2019-06-27 09:32:07 INFO Ghost table created
2019-06-27 09:32:07 INFO Altering ghost table `dba_test`.`_t1_gho`
2019-06-27 09:32:07 INFO Ghost table altered
2019-06-27 09:32:07 INFO Intercepted changelog state GhostTableMigrated
2019-06-27 09:32:07 INFO Waiting for ghost table to be migrated. Current lag is 0s
2019-06-27 09:32:07 INFO Handled changelog state GhostTableMigrated
2019-06-27 09:32:07 INFO Chosen shared unique key is PRIMARY
2019-06-27 09:32:07 INFO Shared columns are id,uid,name,version
2019-06-27 09:32:07 INFO Listening on unix socket file: /tmp/gh-ost.dba_test.t1.sock
2019-06-27 09:32:07 INFO Migration min values: [1]
2019-06-27 09:32:07 INFO Migration max values: [1]
2019-06-27 09:32:07 INFO Waiting for first throttle metrics to be collected
2019-06-27 09:32:07 INFO First throttle metrics collected
# Migrating `dba_test`.`t1`; Ghost table is `dba_test`.`_t1_gho`
# Migrating rm-host1.mysql.rds.aliyuncs.com:3306; inspecting rm-host1.mysql.rds.aliyuncs.com:3306; executing on iZbp1aihmppor0oe9epotuZ
# Migration started at Thu Jun 27 09:32:07 +0800 2019
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: ; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle 
# Serving on unix socket: /tmp/gh-ost.dba_test.t1.sock
Copy: 0/1 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: mysql-bin.001413:27779913; State: migrating; ETA: N/A
2019-06-27 09:32:07 INFO Row copy complete
# Migrating `dba_test`.`t1`; Ghost table is `dba_test`.`_t1_gho`
# Migrating rm-host1.mysql.rds.aliyuncs.com:3306; inspecting rm-host1.mysql.rds.aliyuncs.com:3306; executing on iZbp1aihmppor0oe9epotuZ
# Migration started at Thu Jun 27 09:32:07 +0800 2019
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: ; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle 
# Serving on unix socket: /tmp/gh-ost.dba_test.t1.sock
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: mysql-bin.001413:27780654; State: migrating; ETA: due
2019-06-27 09:32:07 INFO Grabbing voluntary lock: gh-ost.4639852.lock
2019-06-27 09:32:07 INFO Setting LOCK timeout as 6 seconds
2019-06-27 09:32:07 INFO Looking for magic cut-over table
2019-06-27 09:32:07 INFO Creating magic cut-over table `dba_test`.`_t1_del`
2019-06-27 09:32:07 INFO Magic cut-over table created
2019-06-27 09:32:07 INFO Locking `dba_test`.`t1`, `dba_test`.`_t1_del`
2019-06-27 09:32:07 INFO Tables locked
2019-06-27 09:32:07 INFO Session locking original & magic tables is 4639852
2019-06-27 09:32:07 INFO Writing changelog state: AllEventsUpToLockProcessed:1561599127457178334
2019-06-27 09:32:07 INFO Intercepted changelog state AllEventsUpToLockProcessed
2019-06-27 09:32:07 INFO Handled changelog state AllEventsUpToLockProcessed
2019-06-27 09:32:07 INFO Waiting for events up to lock
Copy: 1/1 100.0%; Applied: 0; Backlog: 1/1000; Time: 1s(total), 0s(copy); streamer: mysql-bin.001413:27786099; State: migrating; ETA: due
2019-06-27 09:32:08 INFO Waiting for events up to lock: got AllEventsUpToLockProcessed:1561599127457178334
2019-06-27 09:32:08 INFO Done waiting for events up to lock; duration=987.966735ms
# Migrating `dba_test`.`t1`; Ghost table is `dba_test`.`_t1_gho`
# Migrating rm-host1.mysql.rds.aliyuncs.com:3306; inspecting rm-host1.mysql.rds.aliyuncs.com:3306; executing on iZbp1aihmppor0oe9epotuZ
# Migration started at Thu Jun 27 09:32:07 +0800 2019
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: ; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle 
# Serving on unix socket: /tmp/gh-ost.dba_test.t1.sock
Copy: 1/1 100.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 0s(copy); streamer: mysql-bin.001413:27786557; State: migrating; ETA: due
2019-06-27 09:32:08 INFO Setting RENAME timeout as 3 seconds
2019-06-27 09:32:08 INFO Session renaming tables is 4639848
2019-06-27 09:32:08 INFO Issuing and expecting this to block: rename /* gh-ost */ table `dba_test`.`t1` to `dba_test`.`_t1_del`, `dba_test`.`_t1_gho` to `dba_test`.`t1`
2019-06-27 09:32:08 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to)
2019-06-27 09:32:08 INFO Checking session lock: gh-ost.4639852.lock
2019-06-27 09:32:08 INFO Connection holding lock on original table still exists
2019-06-27 09:32:08 INFO Will now proceed to drop magic table and unlock tables
2019-06-27 09:32:08 INFO Dropping magic cut-over table
2019-06-27 09:32:08 INFO Releasing lock from `dba_test`.`t1`, `dba_test`.`_t1_del`
2019-06-27 09:32:08 INFO Tables unlocked
2019-06-27 09:32:08 INFO Tables renamed
2019-06-27 09:32:08 INFO Lock & rename duration: 1.010443741s. During this time, queries on `t1` were blocked
2019-06-27 09:32:08 INFO Looking for magic cut-over table
[2019/06/27 09:32:08] [info] binlogsyncer.go:164 syncer is closing...
[2019/06/27 09:32:08] [error] binlogstreamer.go:77 close sync with err: sync is been closing...
2019-06-27 09:32:08 INFO Closed streamer connection. err=<nil>
2019-06-27 09:32:08 INFO Dropping table `dba_test`.`_t1_ghc`
[2019/06/27 09:32:08] [info] binlogsyncer.go:179 syncer is closed
2019-06-27 09:32:08 INFO Table dropped
2019-06-27 09:32:08 INFO Dropping table `dba_test`.`_t1_del`
2019-06-27 09:32:08 INFO Table dropped
2019-06-27 09:32:08 INFO Done migrating `dba_test`.`t1`
2019-06-27 09:32:08 INFO Removing socket file: /tmp/gh-ost.dba_test.t1.sock
2019-06-27 09:32:08 INFO Tearing down inspector
2019-06-27 09:32:08 INFO Tearing down applier
2019-06-27 09:32:08 INFO Tearing down streamer
2019-06-27 09:32:08 INFO Tearing down throttler
# Done

3. 其他文档

部分内容来自 https://www.cnblogs.com/mysql-dba/p/9901589.html

ghost 1.0.30版本改进解析

原文地址:https://www.cnblogs.com/jesper/p/179f4bb8aca0bf46bef8f85014f563ca.html