数据不同方式的提取和提交的性能测试（转载）

本文系转载，如果有侵犯您权益的地方，烦请及时告知，本人将即刻停止侵权行为。

网址：http://www.itpub.net/thread-1840325-1-1.html

刚看到开发板块中有个朋友发帖关于bulk collect 和 forall的使用问题。将前些日子给项目组中原先团队遗留的报表存储过程做优化时写的一组简单对比分享给大家,共同进步。
共有5种对比,分别是：
1. 使用游标逐条提取、逐条处理、逐条插入、逐条提交(原先团队留下来的版本)
2. 使用游标逐条提取、逐条处理、逐条插入、利用自定义计数变量分批提交
3. 同方案2,但是使用了ROWCOUNT变量
4. 使用bulk collect limit方式提取数据,逐条处理,逐条插入,然后提交本次数据
5. 使用bulk collect limit方式提取数据,逐条处理,然后使用forall提交本次数据

值得一提的是,在 10g中必须为游标的每一个列自定义数组,
而11g中可以只为游标对应的record声明数组即可。

如有问题，欢迎留言交流

测试代码：
<IGNORE_JS_OP> bulk_test_by_myth.zip (16.96 KB, 下载次数: 13)

(提示:安装完运行存储过程P_TEST_BY_MYTH即可测试,可以修改该存储过程中的变量以改变测试用例的数量以及次数,
使用视图query_result_by_myth查看测试结果)

---

在测试的表中，我增加了两列，我使用DBMS_UTILITY.GET_TIME函数进行了处理时间的统计，通过比较开始和结束时间的差值（是秒的百分之一，与秒的单位是100，即除以

100就换算为秒数）。

备注：测试数据量为10W条
BULK1: 从游标获取数据采用bulk collect into limit count的形式将数据提取到集合变量,然后遍历集合操作数据逐条插入目标表,然后提交本次bulk得到的数据
BULK2: 从游标获取数据采用bulk collect into limit count的形式将数据提取到集合变量,然后遍历集合操作数据逐条插入目标表,然后使用FORALL提交数据
CURSOR1: 从游标逐条获取数据并处理然后插入目标表,根据CURSOR%ROWCOUNT来分批提交
CURSOR2: 从游标逐条获取数据并处理然后插入目标表,根据自定义的计数变量来分批提交
CURSOR3: 从游标逐条获取数据并处理然后插入目标表,并逐条提交

统计语句：

 1  SELECT A.TYPE,
 2         MAX(DECODE(A.COMMIT_COUNT, 100, A.ELAPSED_TIME)) "100",
 3         MAX(DECODE(A.COMMIT_COUNT, 200, A.ELAPSED_TIME)) "200",
 4         MAX(DECODE(A.COMMIT_COUNT, 500, A.ELAPSED_TIME)) "500",
 5         MAX(DECODE(A.COMMIT_COUNT, 1000, A.ELAPSED_TIME)) "1000",
 6         MAX(DECODE(A.COMMIT_COUNT, 2000, A.ELAPSED_TIME)) "2000",
 7         MAX(DECODE(A.COMMIT_COUNT, 5000, A.ELAPSED_TIME)) "5000"
 8    FROM bulk_log_by_myth A
 9   GROUP BY A.TYPE
10   ORDER BY A.TYPE

下面是得出统计结果：

通过上面的数据，可以得出，使用for all提交数据是最快的，通常使用BULK COLLECT INTO XXX LIMIT COUNT读取数据，LIMIT的条数限制为500是比较合适的。

我对存储过程进行了修改，请查看附件下载。

批量处理数据性能测试.rar

tom对于limit值的观点

You Asked
Hi Tom, could you please tell me if exists any difference with LIMIT 1000 and for example LIMIT 100000? If I

increase the number of the limit... my process could run faster?

Thanks!

and we said...
it might

a) run faster (doubtful)
b) the same (doubtful)
c) slower

somewhere between 100 and 500 is typically "the best", getting 100k records INTO MEMORY and managing them take

"work"

What is the limit or the upper end of LIMIT.
At what threshold of LIMIT , will we get optimum performance.

i.e. what would you recommend the LIMIT to be ,  1000 or 2000, or 3000 or 5000 or 10,000

Followup October 10, 2001 - 11am Central time zone:

I'd suggest hundreds, not thousands.  There are ram concerns, you want to keep the interaction
between you and database going back and forth -- not "ok database do tons of work, then I'll do
tons of work and then I'll give you tons of work again".  You want things to be flowing between you
and the database -- not "bursting" between you and the db.

Consider this -- if you array insert 10k rows - you might have to wait for LGWR to make room in the
redo buffer cache (sync writes to the file system).  If you send him 100 or 500 rows -- you won't
wait and LGWR will flush the buffer cache in the background whilst you are off doing the next
100/500 rows.

"Consider this -- if you array insert 10k rows - you might have to wait for LGWR to make room in
the redo buffer cache (sync writes to the file system).  If you send him 100 or 500 rows -- you
won't wait and LGWR will flush the buffer cache in the background whilst you are off doing the next
100/500 rows."

Firstly, YES, I will limit it to a figures between 100 and 500, but what can be the max number, out
of academic interest.

Secondly, what if the redo buffer has a lot of memory, then even in the case I set the limit at
5000, then it should not slow down the process, Iam I right.

Thirdly, Setting the LIMIT to a lesser number, does it also mean faster execution. , then when you
set the LIMIT to a hight number.

KIndly expand on the below--
"if you array insert 10k rows - you might have to wait for LGWR to make room in the redo buffer
cache (sync writes to the file system).  "

http://www.itpub.net/thread-1560427-8-1.html

的第72楼