OGG实时同步Oracle数据到Kafka实施文档(供flink流式计算)

OGG实时同步Oracle数据到Kafka实施文档(供flink流式计算)

 
 

GoldenGate12C For Bigdata+Kafka:通过OGG将Oracle数据以Json格式同步到Kafka提供给flink流式计算

注意:这篇文章告诉了大家怎么搭建OGG for bigdata做测试,但是实际生活中,因为这个文章中对于insert,delete,update均放到一个topic,在后期flink注册流表或则Kylin流式构建cube时候解析有问题(因为json结构不一致),现在给出本人实际flink开发过程中用到的oggfor bigdata配置文档OGG For Bigdata 12按操作类型同步Oracle数据到kafka不同topic
Oracle可以通过OGG for Bigdata将Oracle数据库数据实时增量同步至hadoop平台(kafka,hdfs等)进行消费,笔者搭建这个环境的目的是将Oracle数据库表通过OGG同步到kafka来提供给flink做流计算。这里介绍Oracle通过OGG for Bigdata将数据变更同步至kafka的详细实施过程,整个过程已经通过本人测试没问题。
主机规划与配置:
篇幅原因,Linux系统、源端Oracle数据库和源端OGG12C的软件安装在其他文档写了,这里不再赘述。

一、安装Zookeeper集群

1、安装JDK(目的端操作)

之前安装的jdk1.7,1.7版本jdk在启replicat 进程时由于jdk版本问题导致进程abend,OGG For Bigdata12.3不支持1.7。具体报错详见:四、安装过程遇到的错误
1.1、先看下当前环境是否有安装的jdk

1[root@zookeeper ~]# rpm -qa | grep java 2java-1.7.0-openjdk-1.7.0.99-2.6.5.1.0.1.el6.x86_64 3tzdata-java-2016c-1.el6.noarch 4java-1.6.0-openjdk-1.6.0.38-1.13.10.4.el6.x86_64 5[root@zookeeper ~]# rpm -qa | grep jdk 6java-1.7.0-openjdk-1.7.0.99-2.6.5.1.0.1.el6.x86_64 7java-1.6.0-openjdk-1.6.0.38-1.13.10.4.el6.x86_64 8[root@zookeeper ~]# rpm -qa | grep gcj 9 10

1.2、删除Linux自带的jdk

1[root@zookeeper ~]# rpm -e --nodeps java-1.7.0-openjdk-1.7.0.99-2.6.5.1.0.1.el6.x86_64 2[root@zookeeper ~]# rpm -e --nodeps tzdata-java-2016c-1.el6.noarch 3[root@zookeeper ~]# rpm -e java-1.6.0-openjdk-1.6.0.38-1.13.10.4.el6.x86_64 4[root@zookeeper ~]# rpm -e java-1.7.0-openjdk-1.7.0.99-2.6.5.1.0.1.el6.x86_64 5 6

1.3、检查是否还存在linuk自带jdk

1[root@zookeeper ~]# rpm -qa | grep java 2[root@zookeeper ~]# rpm -qa | grep jdk 3[root@zookeeper ~]# rpm -qa | grep gcj 4已经不存在了 5 6

1.4、创建jdk目录

1[root@zookeeper ~]# mkdir -p /usr/java 2 3

1.5、上传并解压jdk到此目录

1[root@zookeeper ~]# cd /usr/java/ 2[root@zookeeper java]# ls 3jdk-8u151-linux-x64.tar.gz 4[root@zookeeper java]# tar -zxvf jdk-8u151-linux-x64.tar.gz 5[root@zookeeper java]# rm -rf jdk-8u151-linux-x64.tar.gz 6[root@zookeeper java]# ls 7jdk1.8.0_151 8 9

1.6、编辑/etc/profile

1[root@zookeeper java]# vim /etc/profile 2写入下面jdk环境变量,保存退出 3export JAVA_HOME=/usr/java/jdk1.8.0_151 4export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar 5export PATH=$PATH:$JAVA_HOME/bin 6使环境变量生效 7[root@zookeeper java]# source /etc/profile 8 9

1.7、检查jdk是否配置成功

1[root@zookeeper java]# java -version 2java version "1.8.0_151" 3Java(TM) SE Runtime Environment (build 1.8.0_151-b12) 4Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode) 5配置没问题 6 7

2、安装Zookeeper集群 (目的端操作)

这里安装方式采用在一台机器上部署一套三节点(官方推荐最少三节点)的伪集群,后面的zookeeper、kafka、ogg for bigdata软件均放在/kafka目录下
2.1、创建软件存放目录

1[root@zookeeper java]# mkdir /kafka 2[root@zookeeper java]# chown -R oracle:oinstall /kafka/ 3[root@zookeeper java]# chmod -R 777 /kafka/ 4 5

2.2、创建3个Zk server的集群安装目录

1[root@zookeeper java]# mkdir -p /kafka/zookeeper/zookeeper1 2[root@zookeeper java]# mkdir -p /kafka/zookeeper/zookeeper2 3[root@zookeeper java]# mkdir -p /kafka/zookeeper/zookeeper3 4 5

2.3、上传解压zookeeper软件到三个目录

1先上传文件并解压到第一个目录,cd /kafka/zookeeper/zookeeper1/做下面操作 2[root@zookeeper zookeeper1]# tar -zxvf zookeeper-3.4.6.tar.gz 3[root@zookeeper zookeeper1]# rm -rf zookeeper-3.4.6.tar.gz 4[root@zookeeper zookeeper1]# mv zookeeper-3.4.6/* . 5[root@zookeeper zookeeper1]# ls 6bin CHANGES.txt contrib docs ivy.xml LICENSE.txt README_packaging.txt recipes zookeeper-3.4.6 zookeeper-3.4.6.jar.asc zookeeper-3.4.6.jar.sha1 7build.xml conf dist-maven ivysettings.xml lib NOTICE.txt README.txt src zookeeper-3.4.6.jar zookeeper-3.4.6.jar.md5 8另外两个目录的zookeeper软件可以通过第一个目录进行copy就可 9[root@zookeeper zookeeper1]# cp -rp * /kafka/zookeeper/zookeeper2/ 10[root@zookeeper zookeeper1]# cp -rp * /kafka/zookeeper/zookeeper3/ 11 12

2.4、创建日志目录

1创建快照日志存放目录: 2mkdir -p /kafka/zookeeper/zookeeper1/dataDir 3mkdir -p /kafka/zookeeper/zookeeper2/dataDir 4mkdir -p /kafka/zookeeper/zookeeper3/dataDir 5创建事务日志存放目录: 6mkdir -p /kafka/zookeeper/zookeeper1/dataLogDir 7mkdir -p /kafka/zookeeper/zookeeper2/dataLogDir 8mkdir -p /kafka/zookeeper/zookeeper3/dataLogDir 9【注意】:如果不配置dataLogDir,那么事务日志也会写在dataDir目录中。这样会严重影响zk的性能。因为在zk吞吐量很高的时候,产生的事务日志和快照日志太多。 10 11

2.5、修改/etc/hosts

1修改/etc/hosts内容如下 2127.0.0.1 localhost 3192.168.1.21 zookeeper 4192.168.1.21 zookeeper1 5192.168.1.21 zookeeper2 6192.168.1.21 zookeeper3 7 8

2.6、修改zookeeper配置文件

1server1配置如下: 2[root@zookeeper ~]# cd /kafka/zookeeper/zookeeper1/conf/ 3[root@zookeeper conf]# ls 4configuration.xsl log4j.properties zoo_sample.cfg 5[root@zookeeper conf]# mv zoo_sample.cfg zoo.cfg 6[root@zookeeper conf]# vim zoo.cfg 7配置内容如下: 8[root@zookeeper conf]# cat zoo.cfg |grep -v ^#|grep -v ^$ 9tickTime=2000 10initLimit=10 11syncLimit=5 12clientPort=2181 13dataDir=/kafka/zookeeper/zookeeper1/dataDir 14dataLogDir=/kafka/zookeeper/zookeeper1/dataLogDir 15server.1=zookeeper1:2887:3887 16server.2=zookeeper2:2888:3888 17server.3=zookeeper3:2889:3889 18server2配置文件内容如下: 19[root@zookeeper conf]# cat /kafka/zookeeper/zookeeper2/conf/zoo.cfg |grep -v ^#|grep -v ^$ 20tickTime=2000 21initLimit=10 22syncLimit=5 23clientPort=2182 24dataDir=/kafka/zookeeper/zookeeper2/dataDir 25dataLogDir=/kafka/zookeeper/zookeeper2/dataLogDir 26server.1=zookeeper1:2887:3887 27server.2=zookeeper2:2888:3888 28server.3=zookeeper3:2889:3889 29server3配置文件内容如下: 30[root@zookeeper conf]# cat /kafka/zookeeper/zookeeper3/conf/zoo.cfg |grep -v ^#|grep -v ^$ 31tickTime=2000 32initLimit=10 33syncLimit=5 34clientPort=2183 35dataDir=/kafka/zookeeper/zookeeper3/dataDir 36dataLogDir=/kafka/zookeeper/zookeeper3/dataLogDir 37server.1=zookeeper1:2887:3887 38server.2=zookeeper2:2888:3888 39server.3=zookeeper3:2889:3889 40在我们配置的dataDir指定的目录下面,创建一个myid文件,里面内容为一个数字,用来标识当前主机,conf/zoo.cfg文件中配置的server.X中X为什么数字,则myid文件中就输入这个数字: 41[root@zookeeper conf]# echo "1" > /kafka/zookeeper/zookeeper1/dataDir/myid 42[root@zookeeper conf]# echo "2" > /kafka/zookeeper/zookeeper2/dataDir/myid 43[root@zookeeper conf]# echo "3" > /kafka/zookeeper/zookeeper3/dataDir/myid 44 45

2.7、关闭防火墙

1关闭防火墙并禁止开机自启: 2service iptables stop 3sudo chkconfig iptables off 4 5

2.8、启动zookeeper集群

1[root@zookeeper bin]# /kafka/zookeeper/zookeeper1/bin/zkServer.sh start 2JMX enabled by default 3Using config: /kafka/zookeeper/zookeeper1/bin/../conf/zoo.cfg 4Starting zookeeper ... STARTED 5这里虽然显示启动了,但是来看一下启动日志: 6[root@zookeeper bin]# tail -f zookeeper.out 7 at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) 8 at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) 9 at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) 10 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) 11 at java.net.Socket.connect(Socket.java:579) 12 at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) 13 at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402) 14 at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) 15 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762) 162018-12-11 16:58:26,328 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 3200 172018-12-11 16:58:29,530 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address zookeeper2/192.168.1.21:3888 18java.net.ConnectException: Connection refused 19 at java.net.PlainSocketImpl.socketConnect(Native Method) 20 at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) 21 at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) 22 at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) 23 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) 24 at java.net.Socket.connect(Socket.java:579) 25 at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) 26 at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402) 27 at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) 28 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762) 292018-12-11 16:58:29,531 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 3 at election address zookeeper3/192.168.1.21:3889 30java.net.ConnectException: Connection refused 31 at java.net.PlainSocketImpl.socketConnect(Native Method) 32 at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) 33 at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) 34 at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) 35 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) 36 at java.net.Socket.connect(Socket.java:579) 37 at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) 38 at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402) 39 at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) 40 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762) 412018-12-11 16:58:29,531 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 6400 42查看日志,发现日志报错,报错内容为myid=1的节点不能连接到2和3节点,这里不用管,只要启动集群另外其他节点就可正常,直接继续手动启动节点二和节点三: 43[root@zookeeper bin]# /kafka/zookeeper/zookeeper2/bin/zkServer.sh start 44JMX enabled by default 45Using config: /kafka/zookeeper/zookeeper2/bin/../conf/zoo.cfg 46Starting zookeeper ... STARTED 47[root@zookeeper bin]# /kafka/zookeeper/zookeeper3/bin/zkServer.sh start 48JMX enabled by default 49Using config: /kafka/zookeeper/zookeeper3/bin/../conf/zoo.cfg 50Starting zookeeper ... STARTED 51节点二和节点三都起来之后,再去看节点一的日志: 52[root@zookeeper bin]# tail -f zookeeper.out 532018-12-11 16:58:41,828 [myid:3] - INFO [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2183:FileTxnSnapLog@240] - Snapshotting: 0x100000000 to /kafka/zookeeper/zookeeper3/dataDir/version-2/snapshot.1 54000000002018-12-11 16:58:41,717 [myid:2] - INFO [zookeeper2/192.168.1.21:3888:QuorumCnxManager$Listener@511] - Received connection request /192.168.1.21:38969 552018-12-11 16:58:41,729 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@597] - Notification: 1 (message format version), 3 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.st 56ate), 3 (n.sid), 0x0 (n.peerEpoch) LEADING (my state)2018-12-11 16:58:41,761 [myid:2] - INFO [LearnerHandler-/192.168.1.21:49909:LearnerHandler@330] - Follower sid: 3 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@7399ae 57792018-12-11 16:58:41,802 [myid:2] - INFO [LearnerHandler-/192.168.1.21:49909:LearnerHandler@385] - Synchronizing with Follower sid: 3 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x 5802018-12-11 16:58:41,802 [myid:2] - INFO [LearnerHandler-/192.168.1.21:49909:LearnerHandler@462] - Sending SNAP 592018-12-11 16:58:41,802 [myid:2] - INFO [LearnerHandler-/192.168.1.21:49909:LearnerHandler@486] - Sending snapshot last zxid of peer is 0x0 zxid of leader is 0x100000000sent zxid of db as 60 0x1000000002018-12-11 16:58:41,831 [myid:2] - INFO [LearnerHandler-/192.168.1.21:49909:LearnerHandler@522] - Received NEWLEADER-ACK message from 3 612018-12-11 16:58:41,707 [myid:1] - INFO [zookeeper1/192.168.1.21:3887:QuorumCnxManager$Listener@511] - Received connection request /192.168.1.21:57295 622018-12-11 16:58:41,716 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 3 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.st 63ate), 3 (n.sid), 0x0 (n.peerEpoch) FOLLOWING (my state) 64发现此时三个节点都通了,最后的状态为 FOLLOWING (my state),说明集群正常起来了。 65 66

2.9、Zookeeper集群状态查看方式

1可以通过命令jps查看Zookeeper进程: 2[root@zookeeper bin]# jps 328216 QuorumPeerMain 428286 QuorumPeerMain 528356 Jps 628245 QuorumPeerMain 7为了日后操作方便,决定手动编写zookeeper集群启动关闭脚本: 8[root@zookeeper zookeeper]# pwd 9/kafka/zookeeper 10[root@zookeeper zookeeper]# cat startzookeeper.sh 11#! /bin/bash 12/kafka/zookeeper/zookeeper1/bin/zkServer.sh start 13/kafka/zookeeper/zookeeper2/bin/zkServer.sh start 14/kafka/zookeeper/zookeeper3/bin/zkServer.sh start 15tail -f /kafka/zookeeper/zookeeper1/bin/zookeeper.out 16[root@zookeeper zookeeper]# cat stopzookeeper.sh 17#! /bin/bash 18/kafka/zookeeper/zookeeper1/bin/zkServer.sh stop 19/kafka/zookeeper/zookeeper2/bin/zkServer.sh stop 20/kafka/zookeeper/zookeeper3/bin/zkServer.sh stop 21tail -f /kafka/zookeeper/zookeeper1/bin/zookeeper.out 22[root@hadoop zookeeper]# cat statuszookeeper.sh 23#! /bin/bash 24/hadoop/zookeeper/zookeeper1/bin/zkServer.sh status 25/hadoop/zookeeper/zookeeper2/bin/zkServer.sh status 26/hadoop/zookeeper/zookeeper3/bin/zkServer.sh status 27 28

2.10、测试zookeeper集群健康状态

1可以通过ZooKeeper的脚本来查看启动状态,包括集群中各个结点的角色(或是Leader,或是Follower) 2[root@zookeeper zookeeper]# /kafka/zookeeper/zookeeper1/bin/zkServer.sh status 3JMX enabled by default 4Using config: /kafka/zookeeper/zookeeper1/bin/../conf/zoo.cfg 5Mode: follower 6[root@zookeeper zookeeper]# /kafka/zookeeper/zookeeper2/bin/zkServer.sh status 7JMX enabled by default 8Using config: /kafka/zookeeper/zookeeper2/bin/../conf/zoo.cfg 9Mode: leader 10[root@zookeeper zookeeper]# /kafka/zookeeper/zookeeper3/bin/zkServer.sh status 11JMX enabled by default 12Using config: /kafka/zookeeper/zookeeper3/bin/../conf/zoo.cfg 13Mode: follower 14通过上面状态查询结果可见,节点二是集群的Leader,其余的两个结点是Follower。 15另外,可以通过客户端脚本,连接到ZooKeeper集群上。对于客户端来说,ZooKeeper是一个整体,连接到ZooKeeper集群实际上感觉在独享整个集群的服务,所以,你可以在任何一个结点上建立到服务集群的连接。 16[root@zookeeper zookeeper]# /kafka/zookeeper/zookeeper1/bin/zkCli.sh -server localhost:2181 17Connecting to localhost:2181 182018-12-11 17:13:21,760 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 192018-12-11 17:13:21,764 [myid:] - INFO [main:Environment@100] - Client environment:host.name=<NA> 202018-12-11 17:13:21,765 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.7.0_80 212018-12-11 17:13:21,768 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation 222018-12-11 17:13:21,769 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/java/jdk1.7.0_80/jre 232018-12-11 17:13:21,769 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/kafka/zookeeper/zookeeper1/bin/../build/classes:/kafka/zookeeper/zookeeper1/bin/../build 24/lib/*.jar:/kafka/zookeeper/zookeeper1/bin/../lib/slf4j-log4j12-1.6.1.jar:/kafka/zookeeper/zookeeper1/bin/../lib/slf4j-api-1.6.1.jar:/kafka/zookeeper/zookeeper1/bin/../lib/netty-3.7.0.Final.jar:/kafka/zookeeper/zookeeper1/bin/../lib/log4j-1.2.16.jar:/kafka/zookeeper/zookeeper1/bin/../lib/jline-0.9.94.jar:/kafka/zookeeper/zookeeper1/bin/../zookeeper-3.4.6.jar:/kafka/zookeeper/zookeeper1/bin/../src/java/lib/*.jar:/kafka/zookeeper/zookeeper1/bin/../conf:.:/usr/java/jdk1.7.0_80/lib/dt.jar:/usr/java/jdk1.7.0_80/lib/tools.jar2018-12-11 17:13:21,770 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 252018-12-11 17:13:21,770 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp 262018-12-11 17:13:21,770 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA> 272018-12-11 17:13:21,771 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux 282018-12-11 17:13:21,771 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64 292018-12-11 17:13:21,771 [myid:] - INFO [main:Environment@100] - Client environment:os.version=4.1.12-37.4.1.el6uek.x86_64 302018-12-11 17:13:21,772 [myid:] - INFO [main:Environment@100] - Client environment:user.name=root 312018-12-11 17:13:21,772 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/root 322018-12-11 17:13:21,773 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/kafka/zookeeper 332018-12-11 17:13:21,775 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyW 34atcher@49134043Welcome to ZooKeeper! 352018-12-11 17:13:21,821 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@975] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authe 36nticate using SASL (unknown error)2018-12-11 17:13:21,843 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@852] - Socket connection established to localhost/127.0.0.1:2181, initiating session 37JLine support is enabled 38[zk: localhost:2181(CONNECTING) 0] 2018-12-11 17:13:21,996 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server localhost/ 39127.0.0.1:2181, sessionid = 0x1679c86f3d50000, negotiated timeout = 30000 40WATCHER:: 41 42WatchedEvent state:SyncConnected type:None path:null 43 44[zk: localhost:2181(CONNECTED) 0] 45 46

二、安装Kafka

1.1、创建kafka安装目录上传并解压

1[root@zookeeper ~]# mkdir /kafka/kafka 2[root@zookeeper ~]# cd /kafka/kafka/ 3[root@zookeeper kafka]# ls 4kafka_2.11-1.1.1.tgz 5[root@zookeeper kafka]# tar zxf kafka_2.11-1.1.1.tgz 6[root@zookeeper kafka]# ls 7kafka_2.11-1.1.1 kafka_2.11-1.1.1.tgz 8[root@zookeeper kafka]# rm -rf kafka_2.11-1.1.1.tgz 9[root@zookeeper kafka]# ls 10kafka_2.11-1.1.1 11[root@zookeeper kafka]# mv kafka_2.11-1.1.1/* . 12[root@zookeeper kafka]# rm -rf kafka_2.11-1.1.1 13[root@zookeeper kafka]# ls 14bin config libs LICENSE NOTICE site-docs 15 16

1.2、修改配置文件

1[root@zookeeper config]# pwd 2/kafka/kafka/config 3[root@zookeeper config]# vim server.properties 4修改内容如如下: 5[root@zookeeper config]# cat server.properties |grep -v ^#|grep -v ^$ 6broker.id=0 7num.network.threads=3 8num.io.threads=8 9socket.send.buffer.bytes=102400 10socket.receive.buffer.bytes=102400 11socket.request.max.bytes=104857600 12log.dirs=/kafka/kafka/logs 13num.partitions=1 14num.recovery.threads.per.data.dir=1 15offsets.topic.replication.factor=1 16transaction.state.log.replication.factor=1 17transaction.state.log.min.isr=1 18log.retention.hours=168 19log.segment.bytes=1073741824 20log.retention.check.interval.ms=300000 21zookeeper.connect=localhost:2181,localhost:2182,localhost:2183 22zookeeper.connection.timeout.ms=6000 23group.initial.rebalance.delay.ms=0 24delete.topic.enble=true -----如果不指定这个参数,执行删除操作只是标记删除 25auto.create.topics.enable=true -----此参数可以考虑是否添加,如果不添加,在kafka.props中配置的topicname在启动应用进程前必须手动创建好,详细分析见:四、安装过程遇到的错误 26 27

1.3、启动kafka

1在保证zookeeper集群没问题的前提下启动: 2[root@zookeeper kafka]# nohup bin/kafka-server-start.sh config/server.properties& 3如果nohup.out 无报错则说明启动成功。 4 5

1.4、 功能测试

1创建一个topic 2[root@zookeeper bin]# pwd 3/kafka/kafka/bin 4[root@zookeeper bin]# ./kafka-topics.sh --create --zookeeper zookeeper1:2181 --replication-factor 1 --partitions 1 --topic tests 5Created topic "tests". 6查看创建的topic 7[root@zookeeper bin]# ./kafka-topics.sh -describe -zookeeper zookeeper1:2181 8Topic:tests PartitionCount:1 ReplicationFactor:1 Configs: 9 Topic: tests Partition: 0 Leader: 0 Replicas: 0 Isr: 0 10 11打开2个终端, 分别在Kafka目录执行以下命令 12会话1: 13[root@zookeeper bin]# ./kafka-console-producer.sh --broker-list localhost:9092 --topic tests 14> 15会话2: 16[root@zookeeper bin]# ./kafka-console-consumer.sh --zookeeper localhost:2181 --topic tests --from-beginning 17Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper] 18. 19会话1输入消息: 20[root@zookeeper bin]# ./kafka-console-producer.sh --broker-list localhost:9092 --topic tests 21>zhaoyandong 22> 23会话2显示消息: 24[root@zookeeper bin]# ./kafka-console-consumer.sh --zookeeper localhost:2181 --topic tests --from-beginning 25Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper] 26.zhaoyandong 27测试可以正常生产和消费。 28 29

三、安装配置OGG12C For bigdata

1、上传并解压安装OGG for bigdata 软件

1.1、创建ogg软件安装目录

1[root@zookeeper ~]# cd /kafka/ 2[root@zookeeper kafka]# mkdir ogg12 3解压缩: 4[root@zookeeper kafka]# ls 5kafka ogg12 zookeeper 6[root@zookeeper kafka]# cd ogg12/ 7[root@zookeeper ogg12]# unzip OGG_BigData_Linux_x64_12.3.2.1.1.zip 8Archive: OGG_BigData_Linux_x64_12.3.2.1.1.zip 9 inflating: OGGBD-12.3.2.1-README.txt 10 inflating: OGG_BigData_12.3.2.1.1_Release_Notes.pdf 11 inflating: OGG_BigData_Linux_x64_12.3.2.1.1.tar 12[root@zookeeper ogg12]# ls 13OGGBD-12.3.2.1-README.txt OGG_BigData_12.3.2.1.1_Release_Notes.pdf OGG_BigData_Linux_x64_12.3.2.1.1.tar OGG_BigData_Linux_x64_12.3.2.1.1.zip 14[root@zookeeper ogg12]# rm -rf OGG_BigData_Linux_x64_12.3.2.1.1.zip 15[root@zookeeper ogg12]# tar xf OGG_BigData_Linux_x64_12.3.2.1.1.tar 16 17

1.2、配置环境变量

1[root@zookeeper ~]# cat .bash_profile -----之前使用的jdk1.7后来启动应用进程时报错,现在配置为正确配置,之前错误详见后面实施过程中遇到的问题 2# .bash_profile 3 4# Get the aliases and functions 5if [ -f ~/.bashrc ]; then 6 . ~/.bashrc 7fi 8 9# User specific environment and startup programs 10 11PATH=$PATH:$HOME/bin 12 13export PATH 14export JAVA_HOME=/usr/java/jdk1.8.0_151 15export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar 16export PATH=$PATH:$JAVA_HOME/bin 17export GGHOME=/kafka/ogg12 18export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME/jre/lib/amd64/libjsig.so:$JAVA_HOME/jre/lib/amd64/server/libjvm.so:$JAVA_HOME/jre/lib/amd64/server:$JAVA_HOME/jre/lib/amd64:$GG_HOME:/lib 19 20测试一下环境变量有没有问题: 21[root@zookeeper ~]# cd /kafka/ogg12/ 22[root@zookeeper ogg12]# ./ggsci 23 24Oracle GoldenGate for Big Data 25Version 12.3.2.1.1 (Build 005) 26 27Oracle GoldenGate Command Interpreter 28Version 12.3.0.1.2 OGGCORE_OGGADP.12.3.0.1.2_PLATFORMS_180712.2305 29Linux, x64, 64bit (optimized), Generic on Jul 13 2018 00:46:09 30Operating system character set identified as UTF-8. 31 32Copyright (C) 1995, 2018, Oracle and/or its affiliates. All rights reserved. 33 34 35 36GGSCI (zookeeper) 1> info all 37 38Program Status Group Lag at Chkpt Time Since Chkpt 39 40MANAGER STOPPED 41没问题。 42 43

1.3、初始化目录

1GGSCI (zookeeper) 2> create subdirs 2 3Creating subdirectories under current directory /kafka/ogg12 4 5Parameter file /kafka/ogg12/dirprm: created. 6Report file /kafka/ogg12/dirrpt: created. 7Checkpoint file /kafka/ogg12/dirchk: created. 8Process status files /kafka/ogg12/dirpcs: created. 9SQL script files /kafka/ogg12/dirsql: created. 10Database definitions files /kafka/ogg12/dirdef: created. 11Extract data files /kafka/ogg12/dirdat: created. 12Temporary files /kafka/ogg12/dirtmp: created. 13Credential store files /kafka/ogg12/dircrd: created. 14Masterkey wallet files /kafka/ogg12/dirwlt: created. 15Dump files /kafka/ogg12/dirdmp: created. 16 17

1.4、配置MGR进程

1编辑MGR进程 2GGSCI (zookeeper) 5> edit params mgr 3写入下面内容 4PORT 7809 5DYNAMICPORTLIST 7810-7860 6AUTORESTART ER *, RETRIES 3, WAITMINUTES 5 7PURGEOLDEXTRACTS ./dirdat/*, USECHECKPOINTS, MINKEEPDAYS 30 8lagreporthours 1 9laginfominutes 30 10lagcriticalminutes 60 11启动mgr进程 12GGSCI (zookeeper) 7> start mgr 13Manager started. 14 15 16GGSCI (zookeeper) 8> info all 17 18Program Status Group Lag at Chkpt Time Since Chkpt 19 20MANAGER RUNNING 21 22

2、源端创建测试用表:

1在scott下创建测试表kafka: 2create table KAFKA 3( 4 empno NUMBER(4) not null, 5 ename VARCHAR2(10), 6 job VARCHAR2(9), 7 mgr NUMBER(4), 8 hiredate DATE, 9 sal NUMBER(7,2), 10 comm NUMBER(7,2), 11 deptno NUMBER(2) 12); 13alter table KAFKA 14 add constraint PK_KAFKA primary key (EMPNO); 15插入测试数据: 16insert into kafka select * from emp where sal is not null; 17select * from kafka; 18 19

在这里插入图片描述

3、源端增加配置管理、抽取、投递进程

3.1、添加kafka表附加日志

1GGSCI (source) 5> dblogin userid ogg password ogg 2Successfully logged into database. 3 4GGSCI (source as ogg@orcl) 6> add trandata scott.kafka 5 6TRANDATA for instantiation CSN has been added on table 'SCOTT.KAFKA'. 7GGSCI (source as ogg@orcl) 7> add trandata scott.kafka 8 9GGSCI (source as ogg@orcl) 8> info trandata scott.kafka 10 11Logging of supplemental redo log data is enabled for table SCOTT.KAFKA. 12 13Columns supplementally logged for table SCOTT.KAFKA: EMPNO. 14 15Prepared CSN for table SCOTT.KAFKA: 1432176 16 17

3.2、配置OGG的全局变量

1GGSCI (source as ogg@orcl) 6> edit param ./globals 2加入下面内容: 3oggschema ogg 4 5

3.3、配置MGR进程

1GGSCI (source) 1> edit params mgr 2PORT 7809 3DYNAMICPORTLIST 7810-7860 4AUTORESTART ER *, RETRIES 3, WAITMINUTES 5 5PURGEOLDEXTRACTS ./dirdat/*, USECHECKPOINTS, MINKEEPDAYS 30 6lagreporthours 1 7laginfominutes 30 8lagcriticalminutes 60 9启动mgr进程 10GGSCI (source as ogg@orcl) 37> start mgr 11 12

3.4、编辑抽取进程

1GGSCI (source) 2> edit params e_ka 2extract e_ka 3userid ogg,password ogg 4setenv(NLS_LANG=AMERICAN_AMERICA.AL32UTF8) 5setenv(ORACLE_SID="orcl") 6reportcount every 30 minutes,rate 7numfiles 5000 8discardfile ./dirrpt/e_ka.dsc,append,megabytes 1000 9warnlongtrans 2h,checkinterval 30m 10exttrail ./dirdat/ka 11dboptions allowunusedcolumn 12tranlogoptions archivedlogonly 13tranlogoptions altarchivelogdest primary /u01/arch 14dynamicresolution 15fetchoptions nousesnapshot 16ddl include mapped 17ddloptions addtrandata,report 18notcpsourcetimer 19NOCOMPRESSDELETES 20NOCOMPRESSUPDATES 21GETUPDATEBEFORES 22----------scott.kafka 23table SCOTT.KAFKA,tokens( 24TKN-CSN = @GETENV('TRANSACTION', 'CSN'), 25TKN-COMMIT-TS = @GETENV ('GGHEADER', 'COMMITTIMESTAMP'), 26TKN-OP-TYPE = @GETENV ('GGHEADER', 'OPTYPE') 27); 28 29

3.5、添加抽取进程

1GGSCI (source as ogg@orcl) 7> add extract e_ka,tranlog,begin now 2EXTRACT added. 3GGSCI (source as ogg@orcl) 9> add exttrail ./dirdat/ka,extract e_ka,megabytes 500 4EXTTRAIL added. 5启动抽取进程 6GGSCI (source as ogg@orcl) 10>start e_ka 7 8

3.6、投递进程配置

1GGSCI (source as ogg@orcl) 27> edit params d_ka 2加入下面配置 3extract d_ka 4rmthost 192.168.1.21,mgrport 7809,compress 5userid ogg,password ogg 6PASSTHRU 7numfiles 5000 8rmttrail ./dirdat/ka 9dynamicresolution 10table scott.kafka; 11 12

3.7、添加投递进程

1GGSCI (source as ogg@orcl) 29> add extract d_ka,exttrailsource ./dirdat/ka 2EXTRACT added. 3GGSCI (source as ogg@orcl) 30> add rmttrail ./dirdat/ka,extract d_ka,megabytes 500 4RMTTRAIL added. 5启动投递进程 6GGSCI (source as ogg@orcl) 38> start d_ka 7 8

4、表结构定义文件

4.1、defgen配置

1GGSCI (source) 3> edit params test_kafka 2 3defsfile /u01/app/oracle/ogg12/dirdef/kafka.def 4userid ogg,password ogg 5table scott.kafka; 6 7

4.2、生成表结构定义文件

1[oracle@source ogg12]$ pwd 2/u01/app/oracle/ogg12 3[oracle@source ogg12]$ ./defgen paramfile dirprm/test_kafka.prm 4 5*********************************************************************** 6 Oracle GoldenGate Table Definition Generator for Oracle 7 Version 12.2.0.2.2 OGGCORE_12.2.0.2.0_PLATFORMS_170630.0419 8 Linux, x64, 64bit (optimized), Oracle 11g on Jun 30 2017 11:35:56 9 10Copyright (C) 1995, 2017, Oracle and/or its affiliates. All rights reserved. 11 12 13 Starting at 2018-12-14 13:44:52 14*********************************************************************** 15 16Operating System Version: 17Linux 18Version #2 SMP Tue May 17 07:23:38 PDT 2016, Release 4.1.12-37.4.1.el6uek.x86_64 19Node: source 20Machine: x86_64 21 soft limit hard limit 22Address Space Size : unlimited unlimited 23Heap Size : unlimited unlimited 24File Size : unlimited unlimited 25CPU Time : unlimited unlimited 26 27Process id: 5928 28 29*********************************************************************** 30** Running with the following parameters ** 31*********************************************************************** 32defsfile /u01/app/oracle/ogg12/dirdef/kafka.def 33userid ogg,password *** 34table scott.kafka; 35Retrieving definition for SCOTT.KAFKA. 36 37 38Definitions generated for 1 table in /u01/app/oracle/ogg12/dirdef/kafka.def. 39 40

4.3、将生成的表定义文件发送到目标端

1scp /u01/app/oracle/ogg12/dirdef/kafka.def root@192.168.1.21:/kafka/ogg12/dirdef/ 2 3

5、OGG for bigdata 端配置

5.1、添加checkpoint表

1在确保zookeeper集群和kafka正常的情况下做下面配置: 2checkpoint即复制可追溯的一个偏移量记录,在全局配置里添加checkpoint表即可。 3edit param ./GLOBALS 4CHECKPOINTTABLE ogg.checkpoint 5 6

5.2、配置replicate进程

1GGSCI (zookeeper) 5> edit params rkafka 2REPLICAT rkafka 3-- Trail file for this example is located in "AdapterExamples/trail" directory 4-- Command to add REPLICAT 5-- add replicat rkafka, exttrail AdapterExamples/trail/tr 6TARGETDB LIBFILE libggjava.so SET property=dirprm/kafka.props 7REPORTCOUNT EVERY 1 MINUTES, RATE 8GROUPTRANSOPS 10000 9GETUPDATEBEFORES ----12.3版本要加此参数,若不加,在普通update时,即便抽取进程加了GETUPDATEBEFORES等参数,kafka表中的被修改字段修改前的值也不会被写入,11G版本不需要此参数亦可,详情看后面实施过程遇到的错误列表 10MAP SCOTT.*, TARGET SCOTT.*; 11说明:REPLICATE rkafka定义rep进程名称;sourcedefs即在4.6中在源服务器上做的表映射文件;TARGETDB LIBFILE即定义kafka一些适配性的库文件以及配置文件,配置文件位于OGG主目录下的dirprm/kafka.props;REPORTCOUNT即复制任务的报告生成频率;GROUPTRANSOPS为以事务传输时,事务合并的单位,减少IO操作;MAP即源端与目标端的映射关系 12 13

5.3、配置custom_kafka_producer.properties

1[root@zookeeper ogg12]# cd dirprm/ 2[root@zookeeper dirprm]# pwd 3/kafka/ogg12/dirprm 4[root@zookeeper dirprm]# vim custom_kafka_producer.properties 5bootstrap.servers=localhost:9092 6acks=1 7reconnect.backoff.ms=1000 8 9value.serializer=org.apache.kafka.common.serialization.ByteArraySerializer 10key.serializer=org.apache.kafka.common.serialization.ByteArraySerializer 11# 100KB per partition 12batch.size=102400 13linger.ms=10000 14 15

5.4、配置kafka.props

1[root@zookeeper dirprm]# pwd 2/kafka/ogg12/dirprm 3[root@zookeeper dirprm]# vim kafka.props 4写入下面内容 5gg.handlerlist = kafkahandler 6gg.handler.kafkahandler.type=kafka 7gg.handler.kafkahandler.KafkaProducerConfigFile=custom_kafka_producer.properties 8#The following resolves the topic name using the short table name 9gg.handler.kafkahandler.topicMappingTemplate=kafka 10#The following selects the message key using the concatenated primary keys 11#gg.handler.kafkahandler.keyMappingTemplate=${primaryKeys} 12gg.handler.kafkahandler.format=json 13gg.handler.kafkahandler.SchemaTopicName=scott 14gg.handler.kafkahandler.BlockingSend =true 15gg.handler.kafkahandler.includeTokens=false 16gg.handler.kafkahandler.mode=op 17gg.handler.kafkahandler.format.includePrimaryKeys=true 18 19goldengate.userexit.writers=javawriter 20javawriter.stats.display=TRUE 21javawriter.stats.full=TRUE 22 23gg.log=log4j 24gg.log.level=INFO 25 26gg.report.time=30sec 27 28#Sample gg.classpath for Apache Kafka 29gg.classpath=dirprm/:/kafka/kafka/libs/*:/kafka/ogg12/:/kafka/ogg12/lib/* 30#Sample gg.classpath for HDP 31#gg.classpath=/etc/kafka/conf:/usr/hdp/current/kafka-broker/libs/* 32 33javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar 34 35

5.5、添加应用进程

1add replicat rkafka,exttrail ./dirdat/ka 2--默认是从ka00000开始读取, 如果需要修改应用进程读物位置可以执行:alter rkafka extseqno 3 extrba 1123 3启动应用进程: 4GGSCI (zookeeper) 4> start rkafka 5 6

6、验证数据同步

1源端对表kafka做insert、普通update、主键+普通列一起修改的PK Update操作并切换归档: 2SQL> insert into kafka(empno,ename)values(321,'aa'); 3 41 row created. 5 6SQL> commit; 7 8Commit complete. 9 10SQL> update kafka set ename='ggg' where empno=321; 11 121 row updated. 13 14SQL> commit; 15 16Commit complete. 17 18SQL> update kafka set ename='ggg',empno=123 where empno=321; 19 201 row updated. 21 22SQL> commit; 23 24Commit complete. 25 26SQL> delete from kafka where empno=123; 27 281 row deleted. 29 30SQL> commit; 31 32Commit complete. 33 34SQL> alter system switch logfile; 35 36System altered. 37 38目标端查看: 39[root@zookeeper ~]# cd /kafka/kafka/ 40[root@zookeeper kafka]# ls 41bin config console.sh libs LICENSE list.sh logs nohup.out NOTICE productcmd.sh site-docs startkafka.sh 42[root@zookeeper kafka]# ./console.sh 43input topic:kafka 44Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper] 45{"table":"SCOTT.KAFKA","op_type":"I","op_ts":"2018-12-12 00:40:05.640034","current_ts":"2018-12-12T00:41:06.144000","pos":"00000000040000004377","primary_keys":["EMPNO"],"after":{"EMPNO":32 461,"ENAME":"aa","JOB":null,"MGR":null,"HIREDATE":null,"SAL":null,"COMM":null,"DEPTNO":null}} 47{"table":"SCOTT.KAFKA","op_type":"U","op_ts":"2018-12-12 00:40:26.640034","current_ts":"2018-12-12T00:41:16.160000","pos":"00000000040000004889","primary_keys":["EMPNO"],"before":{"EMPNO":3 4821,"ENAME":"aa"},"after":{"EMPNO":321,"ENAME":"ggg"}} 49{"table":"SCOTT.KAFKA","op_type":"U","op_ts":"2018-12-12 00:40:45.640034","current_ts":"2018-12-12T00:41:26.180000","pos":"00000000040000005298","primary_keys":["EMPNO"],"before":{"EMPNO":3 5021,"ENAME":"ggg"},"after":{"EMPNO":123,"ENAME":"ggg"}} 51{"table":"SCOTT.KAFKA","op_type":"D","op_ts":"2018-12-12 00:40:55.640034","current_ts":"2018-12-12T00:41:36.193000","pos":"00000000040000005502","primary_keys":["EMPNO"],"before":{"EMPNO":1 5223,"ENAME":"ggg","JOB":null,"MGR":null,"HIREDATE":null,"SAL":null,"COMM":null,"DEPTNO":null}} 53发现这几种情况的before和after值都能正常捕获。 54 55

四、安装过程遇到的部分错误

1、设置topicName的属性问题

在启动replicat进程时报错如下:

原因:在kafka.props中配置topicName时使用了下面参数;
gg.handler.kafkahandler.topicName=oggtopic
翻阅官方文档,在OGG for bigdata中,配置topicName属性的参数换成了如下:
gg.handler.kafkahandler.topicMappingTemplate=kafka
解决方法:
将kafka.props中参数替换成上面第二个再次重启replicat进程。

2、自动创建topicname失败

replicat进程启动成功后,同步数据时报错如下:
原因:gg.handler.kafkahandler.topicMappingTemplate=kafka参数配置的topicname在进程启动后同步数据时,会自动在kafka创建一个名为kafka的topic来向此topic生产数据,
但是由于kafka本身参数文件server.properties并未启用当生产者或则消费者生产或则消费某个不存在的topic时会自动创建此topic的参数,所以OGG应用进程中定义的kafka主题名自动创建时失败报错。
解决方法:在kafka参数文件server.properties参数文件中加入下面参数:
auto.create.topics.enable=true
OGG for bigdata官方说明如下:
https://docs.oracle.com/goldengate/bd123110/gg-bd/GADBD/using-kafka-handler.htm\#GADBD451
说明如下:
To enable the automatic creation of topics, set the auto.create.topics.enable property to true in the Kafka Broker Configuration. The default value for this property is true.
If the auto.create.topics.enable property is set to false in Kafka Broker configuration, then all the required topics should be created manually before starting the Replicat process.

3、JDK版本不兼容

12018-12-11 22:25:33 WARNING OGG-15053 Java method main(([Ljava/lang/String;)V) is not found in class oracle/goldengate/datasource/UserExitMain. 2 32018-12-11 22:25:33 WARNING OGG-15053 Java method getDataSource(()Loracle/goldengate/datasource/UserExitDataSource;) is not found in class oracle/goldengate/datasource/UserExitMain. 4 52018-12-11 22:25:33 WARNING OGG-15053 Java method shutdown(()V) is not found in class oracle/goldengate/datasource/UserExitMain. 6Exception in thread "main" java.lang.UnsupportedClassVersionError: oracle/goldengate/datasource/UserExitMain : Unsupported major.minor version 52.0 7 at java.lang.ClassLoader.defineClass1(Native Method) 8 at java.lang.ClassLoader.defineClass(ClassLoader.java:800) 9 at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) 10 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) 11 at java.net.URLClassLoader.access$100(URLClassLoader.java:71) 12 at java.net.URLClassLoader$1.run(URLClassLoader.java:361) 13 at java.net.URLClassLoader$1.run(URLClassLoader.java:355) 14 at java.security.AccessController.doPrivileged(Native Method) 15 at java.net.URLClassLoader.findClass(URLClassLoader.java:354) 16 at java.lang.ClassLoader.loadClass(ClassLoader.java:425) 17 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) 18 at java.lang.ClassLoader.loadClass(ClassLoader.java:358) 19 20Source Context : 21 SourceModule : [gglib.ggdal.adapter.java] 22 SourceID : [/scratch/aime/adestore/views/aime_adc4150560/oggcore/OpenSys/src/gglib/ggdal/Adapter/Java/JavaAdapter.cpp] 23 SourceMethod : [HandleJavaException] 24 SourceLine : [246] 25 ThreadBacktrace : [16] elements 26 : [/kafka/ogg12/libgglog.so(CMessageContext::AddThreadContext()+0x1e) [0x7fa29ea530ae]] 27 : [/kafka/ogg12/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...)+0x6ac) [0x7fa29ea439bc]] 28 : [/kafka/ogg12/libgglog.so(_MSG_String(CSourceContext*, int, char const*, CMessageFactory::MessageDisposition)+0x39) [0x7fa29ea31d19]] 29 : [/kafka/ogg12/libggjava.so(+0x2e9e7) [0x7fa295c9d9e7]] 30 : [/kafka/ogg12/libggjava.so(ggs::gglib::ggdal::CJavaAdapter::Open()+0x8b5) [0x7fa295c9fb05]] 31 : [/kafka/ogg12/replicat(ggs::gglib::ggdal::CDALAdapter::Open(ggs::gglib::ggunicode::UString const&)+0x20) [0x82ef60]] 32 : [/kafka/ogg12/replicat(GenericImpl::Open(ggs::gglib::ggunicode::UString const&)+0x2c) [0x8163fc]] 33 : [/kafka/ogg12/replicat(odbc_param(char*, char*)+0xb1) [0x808f41]] 34 : [/kafka/ogg12/replicat(get_infile_params(ggs::gglib::ggapp::ReplicationContextParams&, ggs::gglib::ggdatasource::DataSourceParams&, ggs::gglib::ggdatatarget::Dat 35aTargetParams&, ggs::gglib::ggmetadata::MetadataContext&)+0x9951) [0x5d5e11]] 36 : [/kafka/ogg12/replicat() [0x6e47bd]] 37 : [/kafka/ogg12/replicat(ggs::gglib::MultiThreading::MainThread::ExecMain()+0x5e) [0x7e2d8e]] 38 : [/kafka/ogg12/replicat(ggs::gglib::MultiThreading::Thread::RunThread(ggs::gglib::MultiThreading::Thread::ThreadArgs*)+0x173) [0x7e7153]] 39 : [/kafka/ogg12/replicat(ggs::gglib::MultiThreading::MainThread::Run(int, char**)+0x140) [0x7e79c0]] 40 : [/kafka/ogg12/replicat(main+0x3b) [0x6e7e0b]] 41 : [/lib64/libc.so.6(__libc_start_main+0xfd) [0x323e21ed1d]] 42 : [/kafka/ogg12/replicat() [0x550831]] 43 442018-12-11 22:25:33 ERROR OGG-15051 Java or JNI exception: 45java.lang.UnsupportedClassVersionError: oracle/goldengate/datasource/UserExitMain : Unsupported major.minor version 52.0. 46 472018-12-11 22:25:33 ERROR OGG-01668 PROCESS ABENDING. 48 49

原因:根据上面错误日志来看,需要将Java的libjvm.so 和 libjsig.so库文件所在目录加入LD_LIBRARY_PATH环境变量,需要注意的是,LD_LIBRARY_PATH环境变量成效后,需要将MGR也重启一下,但是我这里环境变量已经配置了,我当前用的Java版本是1.7,OGG for Big Data要的版本是1.8,所以版本不对导致上面问题。
解决办法:jdk1.7换成1.8重新配置后从启即可。

4、Kafka中无update前数据
源端做下面操作:

1SQL> insert into kafka(empno,ename,job)values(222,'zyd','dba'); 21 row created. 3SQL> commit; 4Commit complete. 5SQL> update kafka set ename='zhaoyd' where empno=222; 61 row updated. 7SQL> commit; 8Commit complete. 9SQL> update kafka set ename='yd',empno=666 where empno=222; 101 row updated. 11SQL> commit; 12Commit complete. 13SQL> delete from kafka where empno=666; 141 row deleted. 15SQL> commit; 16Commit complete. 17SQL> alter system switch logfile; 18System altered. 19SQL> / 20System altered. 21 22

查看kafka消费情况:

1{"table":"SCOTT.KAFKA","op_type":"I","op_ts":"2018-12-12 04:33:36.707853","current_ts":"2018-12-12T04:35:01.911000","pos":"00000000040000006618","primary_keys":["EMPNO"],"after":{"EMPNO":22 22,"ENAME":"zyd","JOB":"dba","MGR":null,"HIREDATE":null,"SAL":null,"COMM":null,"DEPTNO":null}} 3{"table":"SCOTT.KAFKA","op_type":"U","op_ts":"2018-12-12 04:34:11.707853","current_ts":"2018-12-12T04:35:22.452000","pos":"00000000040000007136","primary_keys":["EMPNO"],"before":{},"after" 4:{"EMPNO":222,"ENAME":"zhaoyd"}} 5{"table":"SCOTT.KAFKA","op_type":"U","op_ts":"2018-12-12 04:34:29.707853","current_ts":"2018-12-12T04:35:32.567000","pos":"00000000040000007550","primary_keys":["EMPNO"],"before":{"EMPNO":2 622},"after":{"EMPNO":666,"ENAME":"yd"}} 7{"table":"SCOTT.KAFKA","op_type":"D","op_ts":"2018-12-12 04:34:42.707853","current_ts":"2018-12-12T04:35:42.577000","pos":"00000000040000007753","primary_keys":["EMPNO"],"before":{"EMPNO":6 866,"ENAME":"yd","JOB":"dba","MGR":null,"HIREDATE":null,"SAL":null,"COMM":null,"DEPTNO":null}} 9 10

这里总结一下上面消费出现的问题:
源端抽取进程已经添加了
NOCOMPRESSDELETES
NOCOMPRESSUPDATES
GETUPDATEBEFORES
三个参数,此时:
1、如果在源端做非PK Update ,抽取进程应该把主键值以及被修改字段的修改前和修改后的数据都写到trail文件,以供应用进程以json格式写before和after值及其他信息到kafka;
2、如果在源端做PK Update操作,抽取进程会把主键被修改前和修改后的值以及同时修改的其他的字段的修改前和后的值都记录到trail文件,以供应用进程以json格式写before和after值及其他信息到kafka;
3、如果在源端做delete操作,抽取进程会把所有被删除的字段值写到trail文件,以供应用进程以json格式写before信息到kafka。
但是根据看上面的操作发现,现在只有insert的数据是全的,update更新非主键字段before是没有数据的,只有after有数据,更新主键before只有主键的数据,delete正常,是因为虽然加了GETUPDATEBEFORES
参数,trail文件仍然只是写了after的值吗?接下来验证一下:

1源端做非PK Update操作: 2SQL> select empno,ename from kafka; 3 4 EMPNO ENAME 5---------- ---------- 6 7369 er 7 7499 ALLEN 8 7521 WARD 9 7566 JONES 10 7654 MARTIN 11 7698 BLAKE 12 7782 CLARK 13 7839 KING 14 7844 TURNER 15 7876 ADAMS 16 7900 JAMES 17 7902 FORD 18 7934 sdf 19 3233 dsdds 20 2114 rows selected. 22SQL> update kafka set ename='test'where empno=7369; 23 241 row updated. 25 26SQL> commit; 27 28Commit complete. 29 30SQL> alter system switch logfile; 31 32System altered. 33 34SQL> select empno,ename from kafka; 35 36 EMPNO ENAME 37---------- ---------- 38 7369 test 39 7499 ALLEN 40 7521 WARD 41 7566 JONES 42 7654 MARTIN 43 7698 BLAKE 44 7782 CLARK 45 7839 KING 46 7844 TURNER 47 7876 ADAMS 48 7900 JAMES 49 7902 FORD 50 7934 sdf 51 3233 dsdds 52 5314 rows selected. 54源端修改成功,去挖掘trail文件: 55Logdump 92 >n 56TokenID x47 'G' Record Header Info x01 Length 194 57TokenID x48 'H' GHDR Info x00 Length 36 58 450c 0041 001a 0fff 02f2 a32a d082 7880 0000 0000 | E..A.......*..x..... 59 0005 7c10 0000 0109 0252 0000 0001 0001 | ..|......R...... 60TokenID x44 'D' Data Info x00 Length 26 61TokenID x54 'T' GGS Tokens Info x00 Length 24 62TokenID x55 'U' User Tokens Info x00 Length 84 63TokenID x5a 'Z' Record Trailer Info x01 Length 194 64___________________________________________________________________ 65Hdr-Ind : E (x45) Partition : . (x0c) 66UndoFlag : . (x00) BeforeAfter: A (x41) 67RecLength : 26 (x001a) IO Time : 2018/12/14 16:38:42.000.000 68IOType : 15 (x0f) OrigNode : 255 (xff) 69TransInd : . (x02) FormatType : R (x52) 70SyskeyLen : 0 (x00) Incomplete : . (x00) 71AuditRBA : 265 AuditPos : 359440 72Continued : N (x00) RecCount : 1 (x01) 73 742018/12/14 16:38:42.000.000 FieldComp Len 26 RBA 3299 75Name: SCOTT.KAFKA (TDR Index: 1) 76After Image: Partition 12 GU e 77 0000 000a 0000 0000 0000 0000 1cc9 0001 0008 0000 | .................... 78 0004 7465 7374 | ..test 79Column 0 (x0000), Len 10 (x000a) 80 0000 0000 0000 0000 1cc9 | .......... 81Column 1 (x0001), Len 8 (x0008) 82 0000 0004 7465 7374 | ....test 83 84User tokens: 84 bytes 85TKN-CSN : 1462257 86TKN-COMMIT-TS : 2018-12-14 16:38:42.000000 87TKN-OP-TYPE : SQL COMPUPDATE 88 89GGS tokens: 90TokenID x52 'R' ORAROWID Info x00 Length 20 91 4141 4156 7458 4141 4541 4141 414a 6a41 4141 0001 | AAAVtXAAEAAAAJjAAA.. 92 93Logdump 93 >n 94TokenID x47 'G' Record Header Info x01 Length 216 95TokenID x48 'H' GHDR Info x00 Length 36 96 450c 0042 0018 0fff 02f2 a32a d082 7880 0000 0000 | E..B.......*..x..... 97 0005 7c10 0000 0109 0052 0000 0001 0001 | ..|......R...... 98TokenID x44 'D' Data Info x00 Length 24 99TokenID x54 'T' GGS Tokens Info x00 Length 48 100TokenID x55 'U' User Tokens Info x00 Length 84 101TokenID x5a 'Z' Record Trailer Info x01 Length 216 102___________________________________________________________________ 103Hdr-Ind : E (x45) Partition : . (x0c) 104UndoFlag : . (x00) BeforeAfter: B (x42) 105RecLength : 24 (x0018) IO Time : 2018/12/14 16:38:42.000.000 106IOType : 15 (x0f) OrigNode : 255 (xff) 107TransInd : . (x00) FormatType : R (x52) 108SyskeyLen : 0 (x00) Incomplete : . (x00) 109AuditRBA : 265 AuditPos : 359440 110Continued : N (x00) RecCount : 1 (x01) 111 1122018/12/14 16:38:42.000.000 FieldComp Len 24 RBA 3083 113Name: SCOTT.KAFKA (TDR Index: 1) 114Before Image: Partition 12 GU b 115 0000 000a 0000 0000 0000 0000 1cc9 0001 0006 0000 | .................... 116 0002 6572 | ..er 117Column 0 (x0000), Len 10 (x000a) 118 0000 0000 0000 0000 1cc9 | .......... 119Column 1 (x0001), Len 6 (x0006) 120 0000 0002 6572 | ....er 121 122User tokens: 84 bytes 123TKN-CSN : 1462257 124TKN-COMMIT-TS : 2018-12-14 16:38:42.000000 125TKN-OP-TYPE : SQL COMPUPDATE 126 127GGS tokens: 128TokenID x52 'R' ORAROWID Info x00 Length 20 129 4141 4156 7458 4141 4541 4141 414a 6a41 4141 0001 | AAAVtXAAEAAAAJjAAA.. 130TokenID x4c 'L' LOGCSN Info x00 Length 7 131 3134 3632 3235 37 | 1462257 132TokenID x36 '6' TRANID Info x00 Length 9 133 322e 3238 2e31 3134 31 | 2.28.1141 134通过挖掘trail文件发现,抽取进程已经把被update字段的before和after值都写到了trail文件,去kafka 查看消费情况: 135{"table":"SCOTT.KAFKA","op_type":"U","op_ts":"2018-12-12 00:23:10.639439","current_ts":"2018-12-12T00:23:19.143000","pos":"00000000040000003346","primary_keys":["EMPNO"],"before":{},"after" 136:{"EMPNO":7369,"ENAME":"test"}} 137发现还是只有after的值,没有before的值,接下来再看看做PK Update的操作: 138源端操作如下: 139SQL> update kafka set ename='test',empno=3333 where empno=3233; 140 1411 row updated. 142 143SQL> commit; 144 145Commit complete. 146 147SQL> alter system switch logfile; 148 149System altered. 150再去看消费情况: 151{"table":"SCOTT.KAFKA","op_type":"U","op_ts":"2018-12-12 00:29:02.639408","current_ts":"2018-12-12T00:29:11.125000","pos":"00000000040000003759","primary_keys":["EMPNO"],"before":{"EMPNO":3 152233},"after":{"EMPNO":3333,"ENAME":"test"}} 153 154

发现pkupdate也是只有主键列有update前值,为什么普通update前和后的值明明写到trail文件了,为什么应用进程写kafka,看kafka消费时只有after数据,而before为空呢,在11g版本的OGG for bigdata没有这类问题,于是去查看12c的官方文档这个参数的介绍:
https://docs.oracle.com/goldengate/c1221/gg-winux/GWURF/getupdatebefores-ignoreupdatebefores.htm\#GWURF515
这个参数现在也要在应用进程中添加,这样应用进程才会将before值写到kafka,接下来做修改:

1GGSCI (zookeeper) 54> edit params rkafka 2修改内容如下: 3REPLICAT rkafka 4-- Trail file for this example is located in "AdapterExamples/trail" directory 5-- Command to add REPLICAT 6-- add replicat rkafka, exttrail AdapterExamples/trail/tr 7TARGETDB LIBFILE libggjava.so SET property=dirprm/kafka.props 8REPORTCOUNT EVERY 1 MINUTES, RATE 9GROUPTRANSOPS 10000 10GETUPDATEBEFORES 11MAP SCOTT.*, TARGET SCOTT.*; 12GGSCI (zookeeper) 55> stop rkafka 13 14Sending STOP request to REPLICAT RKAFKA ... 15Request processed. 16GGSCI (zookeeper) 56> start rkafka 17 18Sending START request to MANAGER ... 19REPLICAT RKAFKA starting 20 21GGSCI (zookeeper) 57> info all 22 23Program Status Group Lag at Chkpt Time Since Chkpt 24 25MANAGER RUNNING 26REPLICAT RUNNING RKAFKA 00:00:00 00:00:03 27 28再次在源端做普通update操作: 29SQL> update kafka set ename='zhaoyd' where empno=3333; 30 311 row updated. 32 33SQL> commit; 34 35Commit complete. 36 37SQL> alter system switch logfile; 38 39System altered. 40去kafka查看消费情况: 41{"table":"SCOTT.KAFKA","op_type":"U","op_ts":"2018-12-12 00:37:39.639361","current_ts":"2018-12-12T00:37:47.418000","pos":"00000000040000004181","primary_keys":["EMPNO"],"before":{"EMPNO":3 42333,"ENAME":"test"},"after":{"EMPNO":3333,"ENAME":"zhaoyd"}} 43 44

发现这时before也有了值,正常了。原来在11g的版本做,没有这个问题,12.3需要在应用进程也加入GETUPDATEBEFORES参数。

五、安装Appach Flink

这里只是为了测试,不再重新配置hadoop环境,只简单安装一个单机无hadoop环境版本的Flink用于测试

1、下载上传解压Flink 1.4软件:

下载连接:
https://archive.apache.org/dist/flink/flink-1.4.0/

2、启动和关闭flink

1[root@zookeeper kafka]# cd /kafka/flink/ 2[root@zookeeper flink]# ./bin/start-local.sh 3Warning: this file is deprecated and will be removed in 1.5. 4Starting cluster. 5Starting jobmanager daemon on host zookeeper. 6Starting taskmanager daemon on host zookeeper. 7[root@zookeeper flink]./bin/stop-local.sh 8Warning: this file is deprecated and will be removed in 1.5. 9Stopping taskmanager daemon (pid: 5641) on host zookeeper. 10Stopping jobmanager daemon (pid: 5323) on host zookeeper. 11 12

3、编写程序
参考1.4版本的官方文档案例写法:
https://ci.apache.org/projects/flink/flink-docs-release-1.4/quickstart/run_example_quickstart.html\#writing-a-flink-program
https://github.com/wangshubing1/flink/blob/master/flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples/kafka/Kafka010Example.scala

1package com.learn.Flink.kafka 2 3import org.apache.flink.api.common.restartstrategy.RestartStrategies 4import org.apache.flink.api.common.serialization.SimpleStringSchema 5import org.apache.flink.api.java.utils.ParameterTool 6import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment 7import org.apache.flink.streaming.connectors.kafka.{FlinkKafkaConsumer010, FlinkKafkaProducer010} 8import org.apache.flink.api.scala._ 9 10/** 11 * @Author: king 12 * @Datetime: 2018/10/16 13 * @Desc: TODO 14 * 15 */ 16object Kafka010Example { 17 def main(args: Array[String]): Unit = { 18 19 // 解析输入参数 20 val params = ParameterTool.fromArgs(args) 21 22 if (params.getNumberOfParameters < 4) { 23 println("Missing parameters!\n" 24 + "Usage: Kafka --input-topic <topic> --output-topic <topic> " 25 + "--bootstrap.servers <kafka brokers> " 26 + "--zookeeper.connect <zk quorum> --group.id <some id> [--prefix <prefix>]") 27 return 28 } 29 30 val prefix = params.get("prefix", "PREFIX:") 31 32 33 val env = StreamExecutionEnvironment.getExecutionEnvironment 34 env.getConfig.disableSysoutLogging 35 env.getConfig.setRestartStrategy(RestartStrategies.fixedDelayRestart(4, 10000)) 36 // 每隔5秒创建一个检查点 37 env.enableCheckpointing(5000) 38 // 在Web界面中提供参数 39 env.getConfig.setGlobalJobParameters(params) 40 41 // 为卡夫卡0.10 x创建一个卡夫卡流源用户 42 val kafkaConsumer = new FlinkKafkaConsumer010( 43 params.getRequired("input-topic"), 44 new SimpleStringSchema, 45 params.getProperties) 46 //消费kafka数据 47 /*val transaction = env 48 .addSource( 49 new FlinkKafkaConsumer010[String]( 50 params.getRequired("input-topic"), 51 new SimpleStringSchema, 52 params.getProperties)) 53 transaction.print()*/ 54 55 //消费kafka数据 56 val messageStream = env 57 .addSource(kafkaConsumer) 58 .map(in => prefix + in) 59 messageStream.print() 60 // 为卡夫卡0.10 X创建一个生产者 61 val kafkaProducer = new FlinkKafkaProducer010( 62 params.getRequired("output-topic"), 63 new SimpleStringSchema, 64 params.getProperties) 65 66 // 将数据写入kafka 67 messageStream.addSink(kafkaProducer) 68 69 env.execute("Kafka 0.10 Example") 70 } 71 72} 73 74

 

原文地址:https://www.cnblogs.com/yaoyangding/p/15473329.html