Oracle大量数据更新策略

生产上要修改某个产品的产品代号, 而我们系统是以产品为中心的, 牵一发而动全身, 涉及表几乎覆盖全部, 有些表数据量是相当大的, 达到千万, 亿级别.

单纯的维护产品代号的 SQL 是不难的, 但是性能是最大的问题, 最后采用了 rowid+forall分批更新策略.

细节涉及:

游标(rowid)

dbms_sql.Urowid_Table

(异常声明;)

fetch v_rowid_cursor bulk collect into v_rowid_table limit v_once_commit; ...

forall i in 1 .. v_rowid_table.count save exceptions ...

完整代码:

PL/SQL存储过程分批更新↓

 1 --rowid+forall批量更新
 2 declare
 3   v_total_count integer default 500000; --待更新目标记录总计
 4   v_once_commit integer default 10000; --单次提交量
 5   v_pre_prod_code char(8) := '90310004';
 6   v_new_prod_code char(8) := '9031000X';
 7   v_dml_name varchar2(100) := 'update tb_cust_vol_list 20181215 01';
 8   v_suc_count integer := 0; --成功提交计数
 9   v_err_count integer := 0; --失败记录数
10   v_curr_batch integer := 0; --当前提交批次
11   v_start_time date := sysdate; --开始时间
12   --待更新目标记录rowids
13   cursor v_rowid_cursor is
14     select rowid from tb_cust_vol_list 
15      where prod_code = v_pre_prod_code
16      order by rowid; --rowid排序，提高效率
17   v_rowid_table dbms_sql.Uv_rowid_table; --临时单次rowid放置表
18   v_error exception; --异常声明
19   pragma exception_init(v_error, -24381); --指定ora-错误码
20   
21 begin
22   --操作日志
23   insert into tb_dml_log values (
24     v_dml_name,
25     v_total_count,
26     v_once_commit,
27     v_start_time,
28     v_start_time,
29     0, 0, 0, 0, 0
30   );
31   commit;
32   
33   open v_rowid_cursor; --打开rowids游标
34   loop 
35     exit when v_rowid_cursor%notfound;
36     --临时rowids表
37     fetch v_rowid_cursor bulk collect into v_rowid_table limit v_once_commit;
38     exit when v_rowid_table.count = 0;
39     
40     begin 
41       forall i in 1 .. v_rowid_table.count save exceptions
42         --rowid定位行更新
43         update tb_cust_vol_list set prod_code=v_new_prod_code where rowid=v_rowid_table(i); 
44     exception 
45       when v_error then --目标异常
46         dbms_output.put_line('ora-24381, error in array DML !');
47         dbms_output.put_line('exception count: ' || sql%bulk_exceptions.count);
48         v_err_count := v_err_count + sql%bulk_exceptions.count;
49       when others then 
50         dbms_output.put_line('ora-XXX error occurred !');
51         dbms_output.put_line('exception count: ' || sql%bulk_exceptions.count);
52         v_err_count := v_err_count + sql%bulk_exceptions.count;
53     end;
54     
55     v_suc_count := v_suc_count + v_rowid_table.count;
56     v_curr_batch := v_curr_batch + 1;
57     --更新log
58     update tb_dml_log a set 
59         a.curr_time=sysdate, 
60         a.curr_cost=ceil((sysdate-v_start_time)*24*60*60),
61         a.curr_batch=v_curr_batch,
62         a.process=v_suc_count/a.total_count,
63         a.suc_count=v_suc_count,
64         a.err_count=v_err_count 
65       where a.dml_name=v_dml_name;
66     commit;
67     
68   end loop;
69   
70   dbms_output.put_line('total error count: ' || v_err_count);
71 end;

有帮助的博客：
https://blog.csdn.net/leinuo180/article/details/23344647

其他考虑:

如果有的表目标数据实在太大, 就算上述优化依然很费时间, 可将此表的目标数据拆分, 比如按月, 按编号.. 分几个脚本, 维护时并行执行.

查询语句获取所有目标数据的 rowid 时, 如果占比很大(索引空间大), 可以尝试禁用索引查询, 使用全表并行查(网上看到的,本人还未尝试).

超大表要注意索引空间的维护.

@20190126更新

批量数据更新方式

1）表重建

即先新建复制表（create 新表(...) as select ... from 旧表 where ...），然后删除旧表（drop），最后修改表名（rename 新表名 to 旧表名），最后恢复表结构（约束，索引等）。

2）rowid+forall PL/SQL存储过程更新

即上面的办法

3）JDBC程序分批更新

优点是DB无关性，性能也不慢，只是数据量太大（千万级）时，力不从心。