[Oracle 10g & 11g] Collect Function and String Aggregation Utility

Oracle 10g 增加了一个非常有用的 group 函数 ----- COLLECT, 并且在11g中得到了进一步加强。 

这个函数可以用来实现“String Aggregation” 作用, 即把同一个group中的多行数据转成一行(以collection形式出现)。记得曾在园子中回答过一个SQL问题,见这里, 当时用到的是sys_connect_by_path,但是这种方法性能很不好,在下文中会提到。如果用COLLECT来做,性能会得到很大提升!

 

首先来看下Oracle 10g中的COLLECT函数,

 

1. The Collect Function in 10g. (参见这里)

 

(1) 首先来看看一个例子。

[默认schema scott是被锁住的,需要首先解锁!]

 
SQL> alter user scott account unlock

SQL> conn scott/scott
Connected.

SQL> select deptno,
  2         COLLECT(ename) as emps
  3  from emp
  4  group by
  5        deptno;

    DEPTNO EMPS
---------- -------------------------------------------------------------------------------------
        10 SYSTP1Nm4rC2OTHGbil8G7f3Adg==('CLARK', 'KING', 'MILLER')
        20 SYSTP1Nm4rC2OTHGbil8G7f3Adg==('SMITH', 'FORD', 'ADAMS', 'SCOTT', 'JONES')
        30 SYSTP1Nm4rC2OTHGbil8G7f3Adg==('ALLEN', 'BLAKE', 'MARTIN', 'TURNER', 'JAMES', 'WARD')

SQL>

 

 

注意到返回的结果透漏出Oracle创建了一个类型 “SYSTP1Nm4rC2OTHGbil8G7f3Adg==”.

在10g R1, 和 10g R2中这个类型存在的schema是不同的,如下所示,

 

-- Oracle 10.1

SQL> SELECT owner
  2  ,      typecode
  3  FROM   all_types
  4  WHERE  type_name = 'SYSTP1Nm4rC2OTHGbil8G7f3Adg==';

OWNER                          TYPECODE
------------------------------ ------------------------------
SYS                            COLLECTION

1 row selected.

-- Oracle 10.2

SQL> select owner,
  2         typecode
  3  from all_types
  4  where type_name = 'SYSTP1Nm4rC2OTHGbil8G7f3Adg==';

OWNER                TYPECODE
-------------------- ------------------------------
SCOTT                COLLECTION

 

 

正是因为10.1中这个类型是在SYS下面的,因此我们是没法直接用这个类型的,但是在10.2中,我们是可以直接用系统产生的这个类型的,如下,

 

SQL> select *
  2  from
  3  TABLE("SYSTP1Nm4rC2OTHGbil8G7f3Adg=="('A', 'B', 'C'));

COLUMN_VAL
----------
A
B
C

SQL>

 

 

(2) 可否用我们自己定义的类型呢?

答案是肯定的!

 

SQL> create or replace type varchar2_ntt as table of varchar2(4000);
  2  /

Type created.

 

SQL> select deptno,
  2    CAST(COLLECT(ename) AS varchar2_ntt) as emps
  3  from emp
  4  group by deptno;

    DEPTNO EMPS
---------- -----------------------------------------------------------------------------
        10 VARCHAR2_NTT('CLARK', 'KING', 'MILLER')
        20 VARCHAR2_NTT('SMITH', 'FORD', 'ADAMS', 'SCOTT', 'JONES')
        30 VARCHAR2_NTT('ALLEN', 'BLAKE', 'MARTIN', 'TURNER', 'JAMES', 'WARD')

 

 

得到的结果里显示的是我们自己定义的类型—VARCHAR2_NTT (nested table type)

 

但是要注意的是,如果需要处理的数据类型是NUMBER的,要求自定义的类型和待处理的数据类型的精度必须一样。 如下所示:

SQL> create or replace type number_ntt as table of number;
  2  /

Type created.

SQL> select deptno,
  2         CAST(COLLECT(sal) as number_ntt) as sals
  3  from emp
  4  group by deptno;
       CAST(COLLECT(sal) as number_ntt) as sals
                    *


       CAST(COLLECT(sal) as number_ntt) as sals
            *
ERROR at line 2:
ORA-22814: attribute or element value is larger than specified in type

SQL> desc emp;
Name
                                               Null?    Type
--------------------------------------------------------------------------------

EMPNO
                                               NOT NULL NUMBER(4)
ENAME
                                                        VARCHAR2(10)
JOB
                                                        VARCHAR2(9)
MGR
                                                        NUMBER(4)
HIREDATE
                                                        DATE
SAL
                                                        NUMBER(7,2)
COMM
                                                        NUMBER(7,2)
DEPTNO
                                                        NUMBER(2)

 

可以看到SAL的类型是NUMBER(7, 2) 而不是NUMBER, 所以出现问题。解决这个问题有两种方法,要么重新定义类型,要么把SAL转成NUMBER类型。

 

 SQL> select deptno,
  2         CAST(COLLECT(CAST(sal as NUMBER)) as number_ntt) as sals
  3  from   emp
  4  group by
  5         deptno;

     DEPTNO SALS
---------- ----------------------------------------------------------------------
        10 NUMBER_NTT(2450, 5000, 1300)
        20 NUMBER_NTT(800, 3000, 1100, 3000, 2975)
        30 NUMBER_NTT(1600, 2850, 1250, 1500, 950, 1250)

SQL>


 SQL> create type number_7_2_ntt as table of number(7, 2);
  2  /

Type created.

SQL> select deptno,
  2         CAST(COLLECT(sal) as number_7_2_ntt) as sals
  3  from emp
  4  group by deptno;

    DEPTNO SALS
---------- -------------------------------------------------------------
        10 NUMBER_7_2_NTT(2450, 5000, 1300)
        20 NUMBER_7_2_NTT(800, 3000, 1100, 3000, 2975)
        30 NUMBER_7_2_NTT(1600, 2850, 1250, 1500, 950, 1250)

SQL>

 

 

(3) String Aggregation using COLLECT

开头提到,可以用COLLECT来实现string aggregation, 那么如何做呢? 既然可以通过COLLECT函数将同一个GROUP的数据放在了一个自定义的集合中,剩下的只是我们对该集合做操作,把其中的数据取出来拼成字符串即可。

首先,需要定义一个FUNCTION – TO_STRING() 来处理自定义集合类型中的数据,

SQL>

  1  create or replace function to_string(nt_in in varchar2_ntt,
  2                                                           delimiter_in in varchar2 default ',')
  3                                            return varchar2 is
  4     v_idx pls_integer;
  5     v_str varchar2(32767);
  6     v_dlm varchar2(10);
  7  begin
  8     v_idx := nt_in.FIRST;
  9     while v_idx is not null
10     loop
11        v_str := v_str || v_dlm || nt_in(v_idx);
12        v_dlm := delimiter_in;
13        v_idx := nt_in.NEXT(v_idx);
14     end loop;
15     return v_str;
16* end to_string;
SQL> /

Function created.

 

接下来就可以在SQL语句中直接用了,

 

SQL> select deptno,
  2         to_string(CAST(collect(ename) as varchar2_ntt)) as emps
  3  from emp
  4  group by deptno;

    DEPTNO EMPS
---------- ---------------------------------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,FORD,ADAMS,SCOTT,JONES
        30 ALLEN,BLAKE,MARTIN,TURNER,JAMES,WARD

 

 

 

(4) 几种String Aggregation方法性能比较

除了刚才提到的用COLLECT来做String Aggregation,还有个比较有名的方法就是Tom的STRAGG函数,如下, 这个其实就是个自定义的聚集函数,实现了Oracle Data Cartridge Interface中的几个 routines, 具体做法可以参见另一篇blog,

Tom Kyte’s STRAGG

scott@ORA9I.WORLD> create or replace type string_agg_type as object
  2  (
  3     total varchar2(4000),
  4  
  5     static function
  6          ODCIAggregateInitialize(sctx IN OUT string_agg_type )
  7          return number,
  8  
  9     member function
 10          ODCIAggregateIterate(self IN OUT string_agg_type ,
 11                               value IN varchar2 )
 12          return number,
 13  
 14     member function
 15          ODCIAggregateTerminate(self IN string_agg_type,
 16                                 returnValue OUT  varchar2,
 17                                 flags IN number)
 18          return number,
 19  
 20     member function
 21          ODCIAggregateMerge(self IN OUT string_agg_type,
 22                             ctx2 IN string_agg_type)
 23          return number
 24  );
 25  /
 
Type created.
 
scott@ORA9I.WORLD> 
scott@ORA9I.WORLD> create or replace type body string_agg_type
  2  is
  3  
  4  static function ODCIAggregateInitialize(sctx IN OUT string_agg_type)
  5  return number
  6  is
  7  begin
  8      sctx := string_agg_type( null );
  9      return ODCIConst.Success;
 10  end;
 11  
 12  member function ODCIAggregateIterate(self IN OUT string_agg_type,
 13                                       value IN varchar2 )
 14  return number
 15  is
 16  begin
 17      self.total := self.total || ',' || value;
 18      return ODCIConst.Success;
 19  end;
 20  
 21  member function ODCIAggregateTerminate(self IN string_agg_type,
 22                                         returnValue OUT varchar2,
 23                                         flags IN number)
 24  return number
 25  is
 26  begin
 27      returnValue := ltrim(self.total,',');
 28      return ODCIConst.Success;
 29  end;
 30  
 31  member function ODCIAggregateMerge(self IN OUT string_agg_type,
 32                                     ctx2 IN string_agg_type)
 33  return number
 34  is
 35  begin
 36      self.total := self.total || ctx2.total;
 37      return ODCIConst.Success;
 38  end;
 39  
 40  
 41  end;
 42  /
 
Type body created.
 
scott@ORA9I.WORLD> 
scott@ORA9I.WORLD> CREATE or replace
  2  FUNCTION stragg(input varchar2 )
  3  RETURN varchar2
  4  PARALLEL_ENABLE AGGREGATE USING string_agg_type;
  5  /
 
Function created.

另外还有一种方法就是用oracle 10g增加的sys_connect_by_path来做,但是性能不好。

首先来准备测试数据,创建一个数据量是3倍的DBA_OBJECTS的表T。

SQL> drop table t;

Table dropped.


SQL> create table t
  2  as select MOD(rownum, 100) as id,
  3            CAST('A' as VARCHAR2(1)) as val
  4     from dba_objects,
  5             TABLE(varchar2_ntt('A','B', 'C'));

Table created.


SQL> select count(*) from t;

  COUNT(*)
----------
    188148



SQL> exec dbms_stats.gather_table_stats(user, 'T');

PL/SQL procedure successfully completed.
SQL>

设置autotrace来获取性能信息,

SQL> set autotrace traceonly statistics

SQL> set timing on

 

 

-- 1. STRAGG
SQL> select id,
  2         STRAGG(val) as vals
  3  from t
  4  group by id;

100 rows selected.

Elapsed: 00:00:05.10

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        393  consistent gets
          0  physical reads
          0  redo size
      13379  bytes sent via SQL*Net to client
        462  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        100  rows processed

SQL>

时间大约是5秒钟

-- 2. COLLECT

SQL> select id,
  2      to_string(cast(collect(val) as varchar2_ntt)) as vals
  3  from t
  4  group by id;

100 rows selected.

Elapsed: 00:00:01.31

Statistics
----------------------------------------------------------
        436  recursive calls
        110  db block gets
        521  consistent gets
          1  physical reads
      20392  redo size
      13379  bytes sent via SQL*Net to client
        462  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
         17  sorts (memory)
          0  sorts (disk)
        100  rows processed

时间大约是1秒半,比STRAGG要好很多(虽然consistent gets, recursive calls多很多)。

之所以COLLECT比STRAGG性能好很多,主要在于前者的context switch的次数很好,只是在调用to_string的时候才发生(100次), 而STRAGG对每条记录都要发生一次context switch (188148次)。

-- 3. SYS_CONNECT_BY_PATH

SQL> select id,
  2         max(vals) as vals
  3  from
  4  (
  5    select id,
  6           ltrim(sys_connect_by_path(val, ','), ',') as vals
  7    from
  8      ( select id,
  9               val,
10               row_number() over (partition by id order by val) rn
11         from t
12      )
13    start with rn = 1
14    connect by prior id = id and prior rn + 1 = rn
15  )
16  group by id
17  ;

100 rows selected.

Elapsed: 00:24:08.54

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
     547953  consistent gets
          0  physical reads
          0  redo size
      62507  bytes sent via SQL*Net to client
        462  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
       3767  sorts (memory)
          0  sorts (disk)
        100  rows processed

SQL>

时间超过24分钟!!! 而且consistent gets 非常之多。

所以,还是用COLLECT来做String Aggregation性能最好!

2. COLLECT enhancements in 11g R2. (参见这里)

 

(1)  Order collection elemetns

虽然SYS_CONNECT_BY_PATH性能很差,但是有一点还是好的嘛~ 那就是可以进行排序,最后拼出来的字符串可以是经过排序的。那么COLLECT支持排序吗?

答案是肯定的! 注意这里面讲的是Oracle 11.2对COLLECT的增强,但是其实关于这个排序10g也是支持的,但是其他的比如Distinct等只有在Oracle 11.2之后才有。

SQL> select * from v$version where rownum = 1;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod

Elapsed: 00:00:00.00

SQL> conn scott/scott
Connected.


SQL> select deptno,
  2               CAST(COLLECT(ename ORDER BY ename)
  3                        AS varchar2_ntt) as ordered_emps
  4  from emp
  5  group by deptno;

 

    DEPTNO ORDERED_EMPS
---------- -----------------------------------------------------------------------------
        10 VARCHAR2_NTT('CLARK', 'KING', 'MILLER')
        20 VARCHAR2_NTT('ADAMS', 'FORD', 'JONES', 'SCOTT', 'SMITH')
        30 VARCHAR2_NTT('ALLEN', 'BLAKE', 'JAMES', 'MARTIN', 'TURNER', 'WARD')

Elapsed: 00:00:00.01
SQL>

当然也可以根据其他标准来排序,

SQL> select deptno,
  2               CAST(COLLECT(ename ORDER BY hiredate)
  3                         AS varchar2_ntt) as ordered_emps
  4  from emp
  5  group by deptno;

    DEPTNO ORDERED_EMPS
---------- -----------------------------------------------------------------------
        10 VARCHAR2_NTT('CLARK', 'KING', 'MILLER')
        20 VARCHAR2_NTT('SMITH', 'JONES', 'FORD', 'SCOTT', 'ADAMS')
        30 VARCHAR2_NTT('ALLEN', 'WARD', 'BLAKE', 'TURNER', 'MARTIN', 'JAMES')

Elapsed: 00:00:00.02

 

(2) Distinct Collection elements

 

注意DISTICNT/UNIQUE只在Oracle11.2之后才起作用,

 

SQL> select * from v$version where rownum = 1;

BANNER
-----------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production

 

SQL> select deptno,
  2         CAST(COLLECT(distinct job)
  3                  as varchar2_ntt) as distinct_jobs
  4  from emp
  5  group by
  6          deptno;

 

    DEPTNO DISTINCT_JOBS
---------- --------------------------------------------------------------------------------
        10 VARCHAR2_NTT('MANAGER', 'PRESIDENT', 'CLERK')

        20 VARCHAR2_NTT('CLERK', 'ANALYST', 'CLERK', 'ANALYST', 'MANAGER')

        30 VARCHAR2_NTT('SALESMAN', 'MANAGER', 'SALESMAN', 'SALESMAN', 'CLERK', 'SALESMAN')

Elapsed: 00:00:00.03
SQL>

SQL> select deptno,
  2         CAST(COLLECT(UNIQUE job) as varchar2_ntt) as distinct_jobs
  3  from emp
  4  group by deptno;

    DEPTNO DISTINCT_JOBS
---------- --------------------------------------------------------------------------------
        10 VARCHAR2_NTT('MANAGER', 'PRESIDENT', 'CLERK')

        20 VARCHAR2_NTT('CLERK', 'ANALYST', 'CLERK', 'ANALYST', 'MANAGER')

        30 VARCHAR2_NTT('SALESMAN', 'MANAGER', 'SALESMAN', 'SALESMAN', 'CLERK', 'SALESMAN')

Elapsed: 00:00:00.03

可以看出虽然用DISTINCT/UNIQUE,oracle并没有报错,但是结果显示并没有显示出distinct的结果, 因为当前的Oracle版本不是11.2。

那么对于11.2之前的版本,有没有什么workaround呢? 答案是肯定的! 可以用MULTISET函数和操作符。注意,这只对nested tables type (不是VARRAY) 起作用。

SQL> SELECT deptno,
  2         SET(
  3            CAST(
  4                COLLECT(job)
  5                   AS varchar2_ntt)) as distinct_jobs
  6  from emp
  7  GROUP BY deptno;

     DEPTNO DISTINCT_JOBS
---------- -------------------------------------------------------------------
        10 VARCHAR2_NTT('MANAGER', 'PRESIDENT', 'CLERK')
        20 VARCHAR2_NTT('CLERK', 'ANALYST', 'MANAGER')
        30 VARCHAR2_NTT('SALESMAN', 'MANAGER', 'CLERK')

SQL>

这里面用了SET函数,它用来祛除集合中的重复数据。注意SET必须放在CAST外面,因为SET不能作用于系统创建的类型。

原文地址:https://www.cnblogs.com/fangwenyu/p/1636902.html