实时电商数仓(十)之数据采集(九)数据库数据采集(四)Maxwell入门与安装

1  Maxwell

maxwell 是由美国zendesk开源,用java编写的Mysql实时抓取软件。 其抓取的原理也是基于binlog

1.1  工具对比

1 Maxwell 没有 Canal那种server+client模式,只有一个server把数据发送到消息队列或redis。

2 Maxwell 有一个亮点功能,就是Canal只能抓取最新数据,对已存在的历史数据没有办法处理。而Maxwell有一个bootstrap功能,可以直接引导出完整的历史数据用于初始化,非常好用。

3 Maxwell不能直接支持HA,但是它支持断点还原,即错误解决后重启继续上次点儿读取数据。

4 Maxwell只支持json格式,而Canal如果用Server+client模式的话,可以自定义格式。

5 MaxwellCanal更加轻量级。

1.2  安装Maxwell

     解压缩maxwell-1.25.0.tar.gz 到某个目录下。

1.3    使用前准备工作

在数据库中建立一个maxwell库用于存储Maxwell的元数据。

CREATE DATABASE maxwell ;

并且分配一个账号可以操作该数据库

GRANT ALL PRIVILEGES ON *.* TO 'maxwell'@'%' IDENTIFIED BY '123123';

分配这个账号可以监控其他数据库的权限

GRANT  SELECT ,REPLICATION SLAVE , REPLICATION CLIENT  ON *.* TO maxwell@'%'

1.4   使用Maxwell监控抓取MySql数据

在任意位置建立maxwell.properties 文件

producer=kafka
kafka.bootstrap.servers=hadoop1:9092,hadoop2:9092,hadoop3:9092
kafka_topic=ODS_DB_GMALL2020_M

host=hadoop2
user=maxwell
password=123123

client_id=maxwell_1

启动程序

/opt/module/maxwell/bin/maxwell --config  /opt/module/maxwell/config.properties >/dev/null 2>&1 &

1.5   修改或插入mysql数据,并消费kafka进行观察

/ext/kafka_2.11-1.0.0/bin/kafka-topics.sh --create --topic ODS_DB_GMALL2020_M --zookeeper hadoop1:2181,hadoop2:2181,hadoop3:2181     --partitions 12 --replication-factor 1

执行测试语句

INSERT INTO z_user_info VALUES(30,'zhang3','13810001010'),(31,'li4','1389999999');

对比

canal

maxwell

{"data":[{"id":"30","user_name":"zhang3","tel":"13810001010"},{"id":"31","user_name":"li4","tel":"1389999999"}],"database":"gmall-2020-04","es":1589385314000,"id":2,"isDdl":false,"mysqlType":{"id":"bigint(20)","user_name":"varchar(20)","tel":"varchar(20)"},"old":null,"pkNames":["id"],"sql":"","sqlType":{"id":-5,"user_name":12,"tel":12},"table":"z_user_info","ts":1589385314116,"type":"INSERT"}

{"database":"gmall-2020-04","table":"z_user_info","type":"insert","ts":1589385314,"xid":82982,"xoffset":0,"data":{"id":30,"user_name":"zhang3","tel":"13810001010"}}

{"database":"gmall-2020-04","table":"z_user_info","type":"insert","ts":1589385314,"xid":82982,"commit":true,"data":{"id":31,"user_name":"li4","tel":"1389999999"}}

执行update操作

UPDATE z_user_info SET user_name='wang55' WHERE id IN(30,31)

canal

maxwell

{"data":[{"id":"30","user_name":"wang55","tel":"13810001010"},{"id":"31","user_name":"wang55","tel":"1389999999"}],"database":"gmall-2020-04","es":1589385508000,"id":3,"isDdl":false,"mysqlType":{"id":"bigint(20)","user_name":"varchar(20)","tel":"varchar(20)"},"old":[{"user_name":"zhang3"},{"user_name":"li4"}],"pkNames":["id"],"sql":"","sqlType":{"id":-5,"user_name":12,"tel":12},"table":"z_user_info","ts":1589385508676,"type":"UPDATE"}

{"database":"gmall-2020-04","table":"z_user_info","type":"update","ts":1589385508,"xid":83206,"xoffset":0,"data":{"id":30,"user_name":"wang55","tel":"13810001010"},"old":{"user_name":"zhang3"}}

{"database":"gmall-2020-04","table":"z_user_info","type":"update","ts":1589385508,"xid":83206,"commit":true,"data":{"id":31,"user_name":"wang55","tel":"1389999999"},"old":{"user_name":"li4"}}

delete操作

DELETE  FROM z_user_info   WHERE id IN(30,31)

canal

maxwell

{"data":[{"id":"30","user_name":"wang55","tel":"13810001010"},{"id":"31","user_name":"wang55","tel":"1389999999"}],"database":"gmall-2020-04","es":1589385644000,"id":4,"isDdl":false,"mysqlType":{"id":"bigint(20)","user_name":"varchar(20)","tel":"varchar(20)"},"old":null,"pkNames":["id"],"sql":"","sqlType":{"id":-5,"user_name":12,"tel":12},"table":"z_user_info","ts":1589385644829,"type":"DELETE"}

{"database":"gmall-2020-04","table":"z_user_info","type":"delete","ts":1589385644,"xid":83367,"xoffset":0,"data":{"id":30,"user_name":"wang55","tel":"13810001010"}}

{"database":"gmall-2020-04","table":"z_user_info","type":"delete","ts":1589385644,"xid":83367,"commit":true,"data":{"id":31,"user_name":"wang55","tel":"1389999999"}}

总结数据特点:

日志结构

canal 每一条SQL会产生一条日志,如果该条Sql影响了多行数据,则已经会通过集合的方式归集在这条日志中。(即使是一条数据也会是数组结构)

maxwell 以影响的数据为单位产生日志,即每影响一条数据就会产生一条日志。如果想知道这些日志是否是通过某一条sql产生的可以通过xid进行判断,相同的xid的日志来自同一sql。

数字类型

   当原始数据是数字类型时,maxwell会尊重原始数据的类型不增加双引,变为字符串。

   canal一律转换为字符串。

带原始数据字段定义

canal数据中会带入表结构。maxwell更简洁。

本文来自博客园,作者:秋华,转载请注明原文链接:https://www.cnblogs.com/qiu-hua/p/13658815.html

原文地址:https://www.cnblogs.com/qiu-hua/p/13658815.html