[Oracle 9i] Subquery Factoring in 9i (用With语句做公共子查询提取)

Subquery Factoring, 其实就是平常比较熟悉的With语句,Adrian Billington 在他的网站上写了一篇介绍Subquery Factoring很好的文章,见这里。这篇Blog同样是对他的这篇文章的笔记。

With语句一般有两种用途,一种就是用来把复杂的SQL语句简单化:复杂的SQL语句一般都会嵌套很多层次,无论是写起来还是读起来都很困难,通过用With语句,把子查询语句抽取出来,这样可以使得SQL语句“扁平化”,写起来会很方便,读起来也很容易。 With的这种用法其实就是相当于把复杂语句中的inline-view给提取出来作为一个单独的查询语句,并赋予一个名字,这样用起来就跟访问一个视图一样。

With的另外一种用处是可以用来优化SQL语句,如果一个复杂的SQL语句需要重复访问一张表(最好是数据量比较大的表),这个时候如果用With把这部分需要重复访问底层表的SQL语句提取出来(Oracle往往会把这部分数据“物化”到一个临时表中),之后就不用重复多次访问底层表,从而可以提升SQL语句的执行效率。

下面主要看看With在SQL优化方面的一个例子,(注意, 我得到的结果跟Adrian很不一样, 我用的是10g来run他的例子的!)

首先创建测试用例:

SQL> show user
USER is "FRANK"
SQL> CREATE TABLE sales
  2  NOLOGGING
  3  AS
  4     SELECT al.owner    AS region
  5     ,      al.object_type AS product
  6     ,      al.object_id   AS order_amt
  7     FROM   all_objects al
  8     ,      all_objects a2
  9     WHERE  ROWNUM <= 1000000;
 
Table created.
 
SQL> exec DBMS_STATS.GATHER_TABLE_STATS(USER, 'sales');
 
PL/SQL procedure successfully completed.

首先看看不用With语句的效果,

SQL> set autotrace on
SQL> set timing on
SQL>
SQL> SELECT region
  2  ,      total_sales
  3  FROM   (
  4          SELECT region
  5          ,      NVL(SUM(order_amt), 0) AS total_sales
  6          FROM   sales
  7          GROUP  BY
  8                 region
  9          ) ilv
 10  WHERE  total_sales > ( SELECT SUM(order_amt)/3 AS one_third_sales
 11                         FROM  sales);
 
REGION                         TOTAL_SALES
------------------------------ -----------
SYS                               27760438
 
Elapsed: 00:00:01.68
 
Execution Plan
----------------------------------------------------------
Plan hash value: 302444457
 
-----------------------------------------------------------------------------
| Id  | Operation           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |     1 |     7 |   753  (26)| 00:00:10 |
|*  1 |  FILTER             |       |       |       |            |          |
|   2 |   HASH GROUP BY     |       |     1 |     7 |   753  (26)| 00:00:10 |
|   3 |    TABLE ACCESS FULL| SALES |  1006K|  6877K|   597   (7)| 00:00:08 |
|   4 |   SORT AGGREGATE    |       |     1 |     3 |            |          |
|   5 |    TABLE ACCESS FULL| SALES |  1006K|  2947K|   597   (7)| 00:00:08 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NVL(SUM("ORDER_AMT"),0)> (SELECT SUM("ORDER_AMT")/3 FROM
              "SALES" "SALES"))
 
 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       5034  consistent gets
         18  physical reads
          0  redo size
        477  bytes sent via SQL*Net to client
        396  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
 
SQL>

从执行计划可以看到,确实出现了两次访问底层表sales!

这条SQL语句的执行时间是1.68秒,cost 是753, 物理I/O为18.  (这跟Adrian的测试结果差了很多,可能是由于数据量的问题)

然后再看看用With 语句把重复访问表的部分抽出来作为临时表的效果,

SQL> WITH region_sales AS
  2  (  SELECT region
  3     ,      NVL(SUM(order_amt), 0) AS total_sales
  4     FROM   sales
  5     GROUP  BY
  6            region
  7  )
  8  SELECT region
  9  ,      total_sales
 10  FROM   region_sales
 11  WHERE  total_sales > (SELECT SUM(total_sales)/3 AS one_third_sales
 12                        FROM   region_sales);
 
REGION                         TOTAL_SALES
------------------------------ -----------
SYS                               27760438
 
Elapsed: 00:00:02.26
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1988045888
 
--------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                           |     1 |    30 |   757  (26)| 00:00:10 |
|   1 |  TEMP TABLE TRANSFORMATION |                           |       |       |            |          |
|   2 |   LOAD AS SELECT           |                           |       |       |            |          |
|   3 |    HASH GROUP BY           |                           |     1 |     7 |   753  (26)| 00:00:10 |
|   4 |     TABLE ACCESS FULL      | SALES                     |  1006K|  6877K|   597   (7)| 00:00:08 |
|*  5 |   VIEW                     |                           |     1 |    30 |     2   (0)| 00:00:01 |
|   6 |    TABLE ACCESS FULL       | SYS_TEMP_0FD9D6612_10B3A7 |     1 |     7 |     2   (0)| 00:00:01 |
|   7 |    SORT AGGREGATE          |                           |     1 |    13 |            |          |
|   8 |     VIEW                   |                           |     1 |    13 |     2   (0)| 00:00:01 |
|   9 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D6612_10B3A7 |     1 |     7 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   5 - filter("TOTAL_SALES"> (SELECT SUM("TOTAL_SALES")/3 FROM  (SELECT /*+ CACHE_TEMP_TABLE
              ("T1") */ "C0" "REGION","C1" "TOTAL_SALES" FROM "SYS"."SYS_TEMP_0FD9D6612_10B3A7" "T1")
              "REGION_SALES"))
 
 
Statistics
----------------------------------------------------------
        102  recursive calls
         11  db block gets
       2540  consistent gets
         43  physical reads
       1472  redo size
        477  bytes sent via SQL*Net to client
        396  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
 
SQL>

从执行计划可以看出,这次是访问一次底层表(创建临时表),然后是两次对临时表的访问(代价很明显小多了)

时间花销是2.26秒, 代价是757,物理I/O为43,每项数据都比不用With要多! (这个跟Adrian的差得太远了!)这也说明了一点,With语句并不一定会提升性能,从With语句的执行计划中可以看出有一步叫Temp Table Transformation,Oracle会创建一个global temporary table, 并从底层表sales 装载数据到这个临时表中,这些从执行计划都可以看到。这一步带来的代价是很高的(753),所以如果直接访问底层多次的代价比创建临时表的代价还要小的话,用With语句就没有多大价值了。相反,如果底层表的数据量很大,多次访问地层表的代价就很可能超过一次创建临时表的代价,这样With语句的优势就显现出来了!

还有一点要注意的是,用With语句并不代表Oracle一定会创建临时表,因为查询优化器很“聪明”,它会分析值不值得这么做。这个时候,如果我们想明确告诉Oracle,我们需要它来创建这样一个临时表,我们可以用一个hint – materialize .  但是由于这是一个没有官方文档说明的hint, 所以很难保证这个hint会一直起作用,所以用这个hint也要很小心才行!

 

关于With语句用法的一些限制性,Adrian说得很清楚,可以参考他的文章

原文地址:https://www.cnblogs.com/fangwenyu/p/1639527.html