Oracle Golden Gate 系列十八 GG 多对一 realtime data warehousing 说明与示例

一.官网说明

A datawarehousing configuration is a many-to-one configuration. Multiple sourcedatabases send data to one target warehouse database. Oracle GoldenGatesupports like-to-like or heterogeneous transfer of data, with capabilities forfiltering and conversion on any system in the configuration (support varies bydatabase platform).

--多个Source DB发送到一个Target warehouse DB。

1.1 Considerations for a data warehousing configuration

1.1.1 Isolation of data records

Thisconfiguration assumes that each source database contributes different recordsto the target system. If the same record exists in the same table on two ormore source systems and can be changed on any of those systems, conflictresolution routines are needed to resolve conflicts when changes to that recordare made on both sources at the same time and replicated to the target table.

这里假设每个source DB 发送给TargetDB的都是不同的record，如果是相同的record，可能就会在TargetDB 上产生冲突。

1.1.2 Data storage

You can dividethe data storage between the source systems and the target system to reduce theneed for massive amounts of disk space on the target system. This is accomplishedby using a data pump on each source, rather than sending data directly from eachExtract across the network to the target.

(1) A primary Extract writes to alocal trail on each source.

(2) A data-pump Extract on eachsource reads the local trail and sends it across TCP/IP to a dedicated Replicatgroup.

1.1.3 Filtering and conversion

If not all ofthe data from a source system will be sent to the data warehouse, you can use thedata pump to perform the filtering. This removes that processing overhead fromthe primary Extract group, and it reduces the amount of data that is sentacross the network.

1.1.3.1 To filter data,you can use:

(1) A FILTER or WHERE clause in a TABLEstatement (Extract) or in a MAP statement (Replicat).

(2) A SQL query or procedure

(3) User exits

1.1.3.2 To transformdata, you can use:

(1) Native Oracle GoldenGateconversion functions

(2) A user exit from the Extract orReplicat process that applies rules from an external

(3) transformation solution, thenreturns the manipulated data to Oracle GoldenGate.

(4) Replicat to deliver datadirectly to an ETL solution or other transformation engine.

1.1.4 Data volume

The standard configuration is sufficientif:

(1) The transaction load isconsistent and of moderate volume that is spread out more or less evenly amongall of the objects to be replicated.

(2) There are none of thefollowing: tables that are subject to long-running transactions, tables thathave a very large number of columns that change, or tables that contain columnsfor which Oracle GoldenGate must fetch from the database (generally columnswith LOBs, columns that are affected by SQL procedures executed by Oracle GoldenGate,and columns that are not logged to the transaction log).

If yourenvironment does not satisfy those conditions, consider adding one or more setsof parallel processes.

1.2 Creating a data warehousing configuration

Figure 13 Configuration for datawarehousing

1.2.1 Source systems

--To configure theManager process

1. On each source, configure the Managerprocess.

2. In each Manager parameter file, use the PURGEOLDEXTRACTSparameter to control the purging of files from the trail on the local system.

--To configure theprimary Extract groups

3. On each source, use the ADD EXTRACT commandto create a primary Extract group. For documentation purposes, these groups arecalled ext_1 and ext_2.

Extract_1

ADD EXTRACT<ext_1>, TRANLOG, BEGIN <time> [, THREADS <n>]

Extract_2

ADD EXTRACT<ext_2>, TRANLOG, BEGIN <time> [, THREADS <n>]

注意： Use TRANLOG as the data source option. For DB2 on Z/OS, specify thebootstrap data set (BSDS) name following TRANLOG.

4. On each source, use the ADD EXTTRAIL commandto create a local trail.

Extract_1

ADD EXTTRAIL<local_trail_1>, EXTRACT <ext_1>

Extract_2

ADD EXTTRAIL<local_trail_2>, EXTRACT <ext_2>

注意： Use the EXTRACT argument to link each Extract group to the localtrail on the same system. The primary Extract writes to this trail, and thedata-pump reads it.

5. On each source, use the EDIT PARAMS commandto create a parameter file for the primary Extract. Include the followingparameters plus any others that apply to your database environment.

Extract_1

-- Identify the Extract group:

EXTRACT <ext_1>

-- Specify database login information asneeded for the database:

[SOURCEDB <dsn_1>,][USERID<user>[, PASSWORD <pw>]]

-- Specify the local trail that thisExtract writes to:

EXTTRAIL <local_trail_1>

-- Specify tables to be captured:

TABLE <owner>.<table>;

Extract_2

-- Identify the Extract group:

EXTRACT <ext_2>

-- Specify database login information asneeded for the database:

[SOURCEDB <dsn_2>,][USERID<user>[, PASSWORD <pw>]]

-- Specify the local trail that thisExtract writes to:

EXTTRAIL <local_trail_2>

-- Specify tables to be captured:

TABLE <owner>.<table>;

--To configure the datapumps

6. On each source, use the ADD EXTRACT commandto create a data pump Extract group. For documentation purposes, these pumpsare called pump_1 and pump_2.

Data pump_1

ADD EXTRACT <pump_1>, EXTTRAILSOURCE<local_trail_1>, BEGIN <time>

Data pump_2

ADD EXTRACT <pump_2>, EXTTRAILSOURCE<local_trail_2>, BEGIN <time>

注意：Use EXTTRAILSOURCE as the data source option, and specify the nameof the trail on the local system.

7. On each source, use the ADD RMTTRAIL commandto create a remote trail on the target.

Source_1

ADD RMTTRAIL<remote_trail_1>, EXTRACT <pump_1>

Source_2

ADD RMTTRAIL<remote_trail_2>, EXTRACT <pump_2>

注意：Use the EXTRACT argumentto link each remote trail to a different data pump. The data pump writes tothis trail over TCP/IP, and a Replicat reads from it.

8. On each source, use the EDIT PARAMS commandto create a parameter file for the data pump group. Include the followingparameters plus any others that apply to your

database environment.

Data pump_1

-- Identify the data pump group:

EXTRACT <pump_1>

-- Specify database login information asneeded for the database:

[SOURCEDB <dsn_1>,][USERID<user>[, PASSWORD <pw>]]

-- Specify the name or IP address of thetarget system:

RMTHOST <target>, MGRPORT<portnumber>

-- Specify the remote trail on the target system:

RMTTRAIL <remote_trail_1>

-- Allow mapping, filtering, conversion orpass data through as-is:

[PASSTHRU | NOPASSTHRU]

-- Specify tables to be captured:

TABLE <owner>.<table>;

Data pump_2

-- Identify the data pump group:

EXTRACT <pump_2>

-- Specify database login information asneeded for the database:

[SOURCEDB <dsn_2>,][USERID<user>[, PASSWORD <pw>]]

-- Specify the name or IP address of thetarget system:

RMTHOST <target>, MGRPORT<portnumber>

-- Specify the remote trail on the targetsystem:

RMTTRAIL <remote_trail_2>

-- Allow mapping, filtering, conversion orpass data through as-is:

[PASSTHRU | NOPASSTHRU]

-- Specify tables to be captured:

TABLE <owner>.<table>;

注意：Use NOPASSTHRU if the data pump will be filtering or convertingdata, and also use the SOURCEDB and USERID parameters as appropriate for thedatabase, to enable definitions lookups. If the data pump will not be filteringor converting data, use PASSTHRU to bypass the lookups.

NOTE：

To use PASSTHRU mode,the names of the source and target objects must be identical. No column mapping, filtering, SQLEXECfunctions, transformation, or other functions that require data manipulationcan be specified in the parameter file. You can combine normal processing withpass-through processing by pairing PASSTHRU and NOPASSTHRU with different TABLEstatements.

1.2.2 Target system

--To configure theManager process

9. Configure the Manager process.

10. In the Manager parameter file, use the PURGEOLDEXTRACTSparameter to control the purging of files from the trail.

--To configure the Replicat groups

11. On the target, use the ADD REPLICAT commandto create a Replicat group for each remote trail that you created. Fordocumentation purposes, these groups are called rep_1 and rep_2.

Replicat_1

ADD REPLICAT <rep_1>, EXTTRAIL<remote_trail_1>, BEGIN <time>

Replicat_2

ADD REPLICAT <rep_2>, EXTTRAIL<remote_trail_2>, BEGIN <time>

注意： Use the EXTTRAIL argumentto link the Replicat group to the trail.

每个Data Pump 对应一个Replicat进程。

12. On the target, use the EDIT PARAMS commandto create a parameter file for each Replicat group. Include the followingparameters plus any others that apply to your database environment.

Replicat_1

-- Identify the Replicat group:

REPLICAT <rep_1>

-- State whether or not source and targetdefinitions are identical:

SOURCEDEFS <full_pathname> |ASSUMETARGETDEFS

-- Specify database login information asneeded for the database:

[TARGETDB <dsn_3>,] [USERID <userid>[, PASSWORD <pw>]]

-- Specify error handling rules:

REPERROR (<error>, <response>)

-- Specify tables for delivery:

MAP <owner>.<table>, TARGET<owner>.<table>[, DEF <template name>];

Replicat_2

-- Identify the Replicat group:

REPLICAT <rep_2>

-- State whether or not source and targetdefinitions are identical:

SOURCEDEFS <full_pathname> |ASSUMETARGETDEFS

-- Specify database login information asneeded for the database:

[TARGETDB <dsn_3>,] [USERID <userid>[, PASSWORD <pw>]]

-- Specify error handling rules:

REPERROR (<error>, <response>)

-- Specify tables for delivery:

MAP <owner>.<table>, TARGET<owner>.<table>[, DEF <template name>];

注意：You can use anynumber of MAP statements for any given Replicat group. All MAP statements for agiven Replicat group must specify the same objects that are contained in thetrail that is linked to the group.

二.示例

这里还是3个节点的GG，我们这里从节点2和节点3发送到节点1.

DB： oracle 11.2.0.3

OS: redhat 5.4

2.1 在所有节点上设置Manager

GGSCI (gg1) 2> edit params mgr

PORT 7809

2.2 分别在GG2和GG3上添加Extract进程

--GG2

GGSCI (gg2) 4> add extract ext1,tranlog,begin now

GGSCI (gg2) 5> add exttrail/u01/ggate/dirdat/lt, extract ext1

GGSCI (gg2) 7> view params ext1

extract ext1

userid ggate@gg2,password ggate

exttrail /u01/ggate/dirdat/lt

ddl include all objname dave.pdba;

table dave.pdba;

--GG3

GGSCI (gg3) 2> add extract ext1,tranlog,begin now

GGSCI (gg3) 4> add exttrail/u01/ggate/dirdat/lt, extract ext1

GGSCI (gg3) 6> view params ext1

extract ext1

userid ggate@gg3,password ggate

exttrail /u01/ggate/dirdat/lt

ddl include all objname dave.pdba;

table dave.pdba;

2.3 分别在GG2和GG3上添加DataPump 进程

--GG2

GGSCI (gg2) 8> add extractdpump,exttrailsource /u01/ggate/dirdat/lt

GGSCI (gg2) 9> add rmttrail/u01/ggate/dirdat/d1, extract dpump

--注意这里指定的Target端的位置，我们以d1 开头。

GGSCI (gg2) 11> view params dpump

extract dpump

userid ggate@gg2, password ggate

rmthost gg1, mgrport 7809,compress,compressthreshold0

rmttrail /u01/ggate/dirdat/d1

passthru

table dave.pdba;

--GG3

GGSCI (gg3) 7> add extractdpump,exttrailsource /u01/ggate/dirdat/lt

GGSCI (gg3) 8> add rmttrail/u01/ggate/dirdat/d2, extract dpump

GGSCI (gg3) 10> view params dpump

extract dpump

userid ggate@gg3, password ggate

rmthost gg1, mgrport 7809,compress,compressthreshold0

rmttrail /u01/ggate/dirdat/d2

passthru

table dave.pdba;

2.4 在GG1上配置2个Replicat进程，分别对应Data Pump进程

--配置checkpoint

GGSCI (gg1) 3> view params ./GLOBALS

GGSCHEMA ggate

CHECKPOINTTABLE ggate.checkpoint

GGSCI (gg1) 5> dblogin useridggate@gg1,password ggate

Successfully logged into database.

GGSCI (gg1) 7> add checkpointtableggate.checkpoint

Successfully created checkpoint tableGGATE.CHECKPOINT.

这部分，参考：

Oracle Golden Gate 系列十三 --配置GG进程检查点（checkpoint）说明

http://blog.csdn.net/tianlesoftware/article/details/6983928

--配置2个Replicat：

--rep1:

GGSCI (gg1) 8> add replicatrep1,exttrail /u01/ggate/dirdat/d1, checkpointtable ggate.checkpoint

REPLICAT added.

GGSCI (gg1) 11> view params rep1

replicat rep1

ASSUMETARGETDEFS

userid ggate@gg1,password ggate

discardfile/u01/ggate/dirdat/rep1_discard.txt, append, megabytes 10

--HANDLECOLLISIONS

ddl include all

ddlerror default ignore retryop

map dave.pdba, target dave.pdba;

--rep2:

GGSCI (gg1) 9> add replicatrep2,exttrail /u01/ggate/dirdat/d2, checkpointtable ggate.checkpoint

REPLICAT added.

GGSCI (gg1) 13> view params rep2

replicat rep2

ASSUMETARGETDEFS

userid ggate@gg1,password ggate

discardfile/u01/ggate/dirdat/rep2_discard.txt, append, megabytes 10

--HANDLECOLLISIONS

ddl include all

ddlerror default ignore retryop

map dave.pdba, target dave.pdba;

数据初始化的问题，具体参考：

Oracle Golden Gate 系列九--GG 数据初始化装载说明与示例

http://blog.csdn.net/tianlesoftware/article/details/6976551

OracleGoldenGate 系列十二--GG 数据初始化装载二基于SCN 的初始化说明与示例

http://blog.csdn.net/tianlesoftware/article/details/6982908

2.5 启动相关进程进行测试

--GG2：

GGSCI (gg2) 20> start ext1

确定ext1的trails 文件：

GGSCI (gg2) 25> view report ext1

…

2011-11-20 11:31:43 INFO OGG-01056 Recovery initializationcompleted for target file /u01/ggate

/dirdat/lt000018, at RBA 132350, CSN1525929.

2011-11-20 11:31:43 INFO OGG-01478 Output file/u01/ggate/dirdat/lt is using format RELEASE 10.

4/11.1.

2011-11-20 11:31:43 WARNING OGG-01438 Checkpoint marked as from graceful shutdown,but records fou

nd after checkpoint in trail/u01/ggate/dirdat/lt. Expected EOF Seqno0, RBA 0. Found Seqno 18, RB

A 132350.

2011-11-20 11:31:43 INFO OGG-01026 Rolling over remotefile /u01/ggate/dirdat/lt000018.

…

GGSCI (gg2) 24> info dpump

EXTRACT DPUMP Initialized 2011-11-20 11:11 Status STOPPED

Checkpoint Lag 00:00:00 (updated 00:20:16 ago)

Log Read Checkpoint File /u01/ggate/dirdat/lt000000

First Record RBA 0

GGSCI (gg2) 26> alter extractdpump,extseqno 18,extrba 0

EXTRACT altered.

GGSCI (gg2) 27> start dpump

Sending START request to MANAGER ...

EXTRACT DPUMP starting

GGSCI (gg2) 40> info all

Program Status Group Lag Time Since Chkpt

MANAGER RUNNING

EXTRACT RUNNING DPUMP 00:00:00 00:02:05

EXTRACT RUNNING EXT1 00:00:00 00:00:02

REPLICAT ABENDED REP1 00:00:00 12:24:42

GGSCI (gg2) 41> info dpump

EXTRACT DPUMP Last Started 2011-11-2011:37 Status RUNNING

Checkpoint Lag 00:00:00 (updated 00:02:24 ago)

Log Read Checkpoint File /u01/ggate/dirdat/lt000018

First Record RBA 0

这里注意Data Pump 默认是从lt000000 开始读取trail 文件，如果是新搭建的GG 同步，那么都是从lt000000开始处理，所以没有问题，如果是后来修改称data pump，就需要根据extract 进程的错误提示信息，使用如下命令：

GGSCI(gg1) 82> alter extract dpump,extseqno 18,extrba 0

具体参考：

OracleGoldenGate 系列十 -- 配置DataPump process 说明与示例

http://blog.csdn.net/tianlesoftware/article/details/6978501

--GG3:

GGSCI (gg3) 14> start ext1

Sending START request to MANAGER ...

EXTRACT EXT1 starting

GGSCI (gg3) 18> view report ext1

…

2011-11-20 11:39:50 INFO OGG-01056 Recovery initializationcompleted for target file /u01/ggate

/dirdat/lt000016, at RBA 132672, CSN1525929.

2011-11-20 11:39:50 INFO OGG-01478 Output file/u01/ggate/dirdat/lt is using format RELEASE 10.

4/11.1.

2011-11-20 11:39:50 WARNING OGG-01438 Checkpoint marked as from graceful shutdown,but records fou

nd after checkpoint in trail/u01/ggate/dirdat/lt. Expected EOF Seqno0, RBA 0. Found Seqno 16, RB

A 132672.

2011-11-20 11:39:50 INFO OGG-01026 Rolling over remotefile /u01/ggate/dirdat/lt000016.

…

GGSCI (gg3) 19> alter extractdpump,extseqno 16,extrba 0

EXTRACT altered.

GGSCI (gg3) 20> info dpump

EXTRACT DPUMP Initialized 2011-11-20 11:41 Status STOPPED

Checkpoint Lag 00:00:00 (updated 00:00:06 ago)

Log Read Checkpoint File /u01/ggate/dirdat/lt000016

First Record RBA 0

GGSCI (gg3) 21> start dpump

Sending START request to MANAGER ...

EXTRACT DPUMP starting

GGSCI (gg3) 22> info all

Program Status Group Lag Time Since Chkpt

MANAGER RUNNING

EXTRACT RUNNING DPUMP 00:00:00 00:00:13

EXTRACT RUNNING EXT1 00:00:00 00:00:05

REPLICAT ABENDED REP1 00:00:00 12:28:42

--GG1:

GGSCI (gg1) 18> start rep1

Sending START request to MANAGER ...

REPLICAT REP1 starting

GGSCI (gg1) 21> start rep2

Sending START request to MANAGER ...

REPLICAT REP2 starting

GGSCI (gg1) 22> info all

Program Status Group Lag Time Since Chkpt

MANAGER RUNNING

EXTRACT ABENDED DPUMP 00:00:00 12:34:55

EXTRACT ABENDED DPUMP2 00:00:00 12:34:49

EXTRACT ABENDED EXT1 00:00:00 12:34:51

REPLICAT RUNNING REP1 00:00:00 00:17:37

REPLICAT RUNNING REP2 00:00:00 00:17:15

2.6 验证同步

--GG2的pdba 表insert 一条记录：

SQL> conn dave/dave;

Connected.

SQL> insert into pdba values(2222,sysdate);

1 row created.

SQL> commit;

Commit complete.

--GG3的pdba 表insert 一条记录：

SQL> conn dave/dave;

Connected.

SQL> insert into pdba values(3333,sysdate);

1 row created.

SQL> commit;

Commit complete.

--在GG1 的pdba 表查询这条记录：

SQL> select * from pdba where idin(2222,3333);

no rows selected

没有同步过来。

GGSCI (gg1) 86> info rep1

REPLICAT REP1 Last Started 2011-11-2012:22 Status RUNNING

Checkpoint Lag 00:00:00 (updated 00:00:01 ago)

Log Read Checkpoint File /u01/ggate/dirdat/d1000000

First Record RBA 0

GGSCI (gg1) 87> stop rep1

Sending STOP request to REPLICAT REP1 ...

Request processed.

GGSCI (gg1) 88> stop rep2

Sending STOP request to REPLICAT REP2 ...

Request processed.

GGSCI (gg1) 89> alter replicat rep1extseqno 1 extrba 0

REPLICAT altered.

GGSCI (gg1) 91> alter replicat rep2 extseqno1 extrba 0

REPLICAT altered.

GGSCI (gg1) 93> start rep1

Sending START request to MANAGER ...

REPLICAT REP1 starting

GGSCI (gg1) 94> start rep2

Sending START request to MANAGER ...

REPLICAT REP2 starting

--最后确认一下：

SQL> select * from pdba where id<4444;

ID TIME

---------- ------------

2222 20-NOV-11

3333 20-NOV-11

多对一的测试同步成功。

注意这里有2个小技巧：

1. 如果同步有问题，但进程report又没有提供更多信息，可以使用Logdump工具查看trails里的内容，从而确认是哪一步出现了问题。关于logdump ，参考：

Oracle Golden Gate 系列十五 --GG Trails 说明

http://blog.csdn.net/tianlesoftware/article/details/6990611

2. 如果tail 搞不定，一个简单的解决方法：

把所有进程delete 掉，所有的tails 全部delete 掉，重新来过，这种方法比查看trail 要节省很多时间。

-------------------------------------------------------------------------------------------------------

Blog: http://blog.csdn.net/tianlesoftware

Weibo: http://weibo.com/tianlesoftware

Email: tianlesoftware@gmail.com

Skype: tianlesoftware

-------加群需要在备注说明Oracle表空间和数据文件的关系，否则拒绝申请----

DBA1 群：62697716(满); DBA2 群：62697977(满) DBA3 群：62697850(满)

DBA 超级群：63306533(满); DBA4 群：83829929(满) DBA5群： 142216823(满)

DBA6 群：158654907(满) DBA7 群：69087192(满) DBA8 群：172855474

DBA 超级群2：151508914 DBA9群：102954821 聊天群：40132017(满)