PLSQL_性能优化系列17_Oracle Merge Into和Update更新效率

2015-05-21 Created By BaoXinjian

一、摘要

以前只考虑 merge into 只是在特定场合下方便才使用的，今天才发现，merge into 竟然会比 update 在更新数据时有这么大的改进。

其实呢，merge into部分的update和update也没啥不同的，不同的地方在于使用merge into后执行计划变了。

merge方法是最简洁，效率最高的方式，在大数据量更新时优先使用这种方式。

1. 基本语法

merge into test1 using test2
on (test1.id = test2.id)
when matched then update
set test1.name = nvl2(test1.name,test2.name,test1.name);

update内联视图方式：使用这种方式必须在test2.id上有主键 (这里很好理解，必须保证每一个test1.id对应在test2里只有一条记录，如果test2中有多条对应的记录，怎么更新test1)

或者on (test1.id = test2.id, test1.name = test2.name ....)，通过多栏位对比，确认唯一记录，类似Unique Index

2. 使用并行，加快大量数据更新：

merge /*+parallel(test1,4)*/ into test1 using test2
on (test1.id = test2.id)
when matched then update
set test1.name = nvl2(test1.name,test2.name,test1.name);

二、测试案例 - Update / Merge Into

1. 创建测试数据

create table test1 as select * from dba_objects where rownum<=10000;--10000条记录

create table test2 as select * from dba_objects--73056条记录

2. 直接Update时间和效率

SQL> alter system flush shared_pool;

System altered.

SQL> alter system flush buffer_cache;

System altered.

SQL> set linesize 400 pagesize 400
SQL> set autot trace
SQL> set timing on
SQL> update test1 t1
  2     set t1.object_name = (select t2.object_name
  3                             from test2 t2
  4                            where t2.object_id = t1.object_id);

10000 rows updated.

Elapsed: 00:06:33.35

Execution Plan
----------------------------------------------------------
   0      UPDATE STATEMENT Optimizer=ALL_ROWS (Cost=2923252 Card=10011 Bytes=790869)
   1    0   UPDATE OF 'TEST1'
   2    1     TABLE ACCESS (FULL) OF 'TEST1' (TABLE) (Cost=40 Card=10011 Bytes=790869)
   3    1     TABLE ACCESS (FULL) OF 'TEST2' (TABLE) (Cost=292 Card=772 Bytes=60988)

Statistics
----------------------------------------------------------
        430  recursive calls
      11122  db block gets
   15275257  consistent gets
       1175  physical reads
    4058752  redo size
        520  bytes sent via SQL*Net to client
        668  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          7  sorts (memory)
          0  sorts (disk)
      10000  rows processed

3. 通过Merge Into时间和效率

SQL> alter system flush shared_pool;

System altered.

Elapsed: 00:00:00.45
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.71
SQL> merge into test1 t1
  2  using test2 t2
  3  on (t1.object_id = t2.object_id)
  4  when matched then
  5    update set t1.object_name = t2.object_name;

10000 rows merged.

Elapsed: 00:00:00.92

Execution Plan
----------------------------------------------------------
   0      MERGE STATEMENT Optimizer=ALL_ROWS (Cost=1243 Card=10011 Bytes=1321452)
   1    0   MERGE OF 'TEST1'
   2    1     VIEW
   3    2       HASH JOIN (Cost=1243 Card=10011 Bytes=4264686)
   4    3         TABLE ACCESS (FULL) OF 'TEST1' (TABLE) (Cost=40 Card=10011 Bytes=2192409)
   5    3         TABLE ACCESS (FULL) OF 'TEST2' (TABLE) (Cost=292 Card=77163 Bytes=15972741)

Statistics
----------------------------------------------------------
       1224  recursive calls
      10279  db block gets
       1586  consistent gets
       1191  physical reads
    2803872  redo size
        526  bytes sent via SQL*Net to client
        634  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
         12  sorts (memory)
          0  sorts (disk)
      10000  rows processed

三、解析计划

1. 通过Update的解析计划

SQL> set autot off
SQL> update /*+gather_plan_statistics*/ test1 t1
  2     set t1.object_name = (select t2.object_name
  3                             from test2 t2
  4                            where t2.object_id = t1.object_id);

10000 rows updated.

Elapsed: 00:04:32.81
SQL> select * from table(dbms_xplan.display_cursor(null,null,'iostats'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
SQL_ID  c8qt9a54qgmqg, child number 0
-------------------------------------
update /*+gather_plan_statistics*/ test1 t1    set t1.object_name =
(select t2.object_name                            from test2 t2
                  where t2.object_id = t1.object_id)

Plan hash value: 3883393169

--------------------------------------------------------------------------------------
| Id  | Operation          | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT   |       |      1 |        |      0 |00:04:32.73 |      10M|
|   1 |  UPDATE            | TEST1 |      1 |        |      0 |00:04:32.73 |      10M|
|   2 |   TABLE ACCESS FULL| TEST1 |      1 |  10011 |  10000 |00:00:00.17 |     133 |
|*  3 |   TABLE ACCESS FULL| TEST2 |  10000 |    772 |  10000 |00:04:31.51 |      10M|
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("T2"."OBJECT_ID"=:B1)

Note
-----
   - dynamic sampling used for this statement (level=2)


26 rows selected.

Elapsed: 00:00:01.38

2. 通过Merge Into的解析计划

SQL> merge /*+gather_plan_statistics*/
  2  into test1 t1
  3  using test2 t2
  4  on (t1.object_id = t2.object_id)
  5  when matched then
  6    update set t1.object_name = t2.object_name;

10000 rows merged.

Elapsed: 00:00:00.52
SQL> select * from table(dbms_xplan.display_cursor(null,null,'iostats'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------
SQL_ID  9n4tc6tvwaj9c, child number 0
-------------------------------------
merge /*+gather_plan_statistics*/ into test1 t1 using test2 t2 on
(t1.object_id = t2.object_id) when matched then   update set
t1.object_name = t2.object_name

Plan hash value: 818823782

----------------------------------------------------------------------------------------
| Id  | Operation            | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT      |       |      1 |        |      0 |00:00:00.47 |   11458 |
|   1 |  MERGE               | TEST1 |      1 |        |      0 |00:00:00.47 |   11458 |
|   2 |   VIEW               |       |      1 |        |  10000 |00:00:00.33 |    1179 |
|*  3 |    HASH JOIN         |       |      1 |  10011 |  10000 |00:00:00.25 |    1179 |
|   4 |     TABLE ACCESS FULL| TEST1 |      1 |  10011 |  10000 |00:00:00.08 |     133 |
|   5 |     TABLE ACCESS FULL| TEST2 |      1 |  77163 |  73056 |00:00:00.26 |    1046 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")

Note
-----
   - dynamic sampling used for this statement (level=2)


28 rows selected.

Elapsed: 00:00:00.15

四、结果分析

1. 测试结果对比：update和merge into 都更新1w条记录，

update耗时6分钟，逻辑读消耗15275257；

merge into 耗时6秒钟，消耗逻辑读1586，相差太大了。

2. 其实看着执行计划，这个结果也很容易理解：

update采用的类似nested loop的方式，对更新的每一行，都会对查询的表扫描一次；

merge into这里选择的是hash join，则针对每张表都是做了一次 full table scan，对每张表都只是扫描一次。

3. Oracle官方建议，在大数据更新过程中，也是通过使用Merge Into代替Update

Thanks and Regards

参考： http://blog.csdn.net/xiexbb/article/details/4242063