SQL Techniques – Columns to Rows, Rows to Columns

Abstract: 行变列,列变行在生成报表的时候经常遇到,行变列叫做"Pivot”, 反之叫做"Unpivot”。 在Oracle11g之前,一般都是通过很多work around来实现,但是Oracle11g中直接支持PIVOT和UNPIVOT的操作。

关于PIVOT和UNPIVOT用法可以参见这里

1. Rows to Columns

Rows to Columns是比较常见的操作,比如说下面这个例子,有这样一张表,

 

SQL> desc dept_emp_distribute;
Name                                      Null?    Type
----------------------------------------- -------- ----------------------------
DEPT                                                  NUMBER
REGION                                              VARCHAR2(20)
EMPLOYEE_COUNT                             NUMBER

 

SQL> select * from dept_emp_distribute;

      DEPT REGION               EMPLOYEE_COUNT
---------- -------------------- --------------
         1 SZ                               10
         1 SH                               15
         1 HZ                               25
         2 HZ                                5
         2 SZ                               15
         2 SH                               35

6 rows selected.

现在假设我们要生成如下这张报表,

      DEPT EMPLOYEE_IN_SZ EMPLOYEE_IN_SH EMPLOYEE_IN_HZ
---------- -------------- -------------- --------------
         1             10             15             25
         2             15             35              5

在11g之前,最容易想到的方法,就是采用case when 或 decode来做,如下所示,

SQL> select dept,
  2    max(case region when 'SZ' then employee_count else null end) employee_in_sz,
  3    max(case region when 'SH' then employee_count else null end) employee_in_sh,
  4    max(case region when 'HZ' then employee_count else null end) employee_in_hz
  5  from
  6    dept_emp_distribute
  7  group by
  8    dept;

 

11g中可以用PIVOT来写SQL,如下,

SQL>  select * from
  2  ( select dept, region, employee_count from dept_emp_distribute)
  3  pivot
  4  (
  5    max(employee_count) as employee_count
  6    for region in ('SZ' as SZ, 'SH' as SH, 'HZ' as HZ)
  7  );

      DEPT SZ_EMPLOYEE_COUNT SH_EMPLOYEE_COUNT HZ_EMPLOYEE_COUNT
---------- ----------------- ----------------- -----------------
         1                10                15                25
         2                15                35                 5

 

注意Pivot需要一个聚集函数,因此用了max(employee_count),如果去掉这个max(),会得到如下错误:

ORA-56902: expect aggregate function inside pivot operation

 

Execution Plan
----------------------------------------------------------
Plan hash value: 2443834872

-------------------------------------------------------------------------------------------
| Id  | Operation           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                     |     6 |   228 |     4  (25)| 00:00:01 |
|   1 |  HASH GROUP BY PIVOT|                     |     6 |   228 |     4  (25)| 00:00:01 |
|   2 |   TABLE ACCESS FULL | DEPT_EMP_DISTRIBUTE |     6 |   228 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

 

对应的Pivot操作符的执行操作是HASH GROUP BY PIVOT.

 

 

这两种方法虽然能满足需求,但是显然不够灵活,因为Hard code了REGION在SQL语句中,如果REGION的信息不确定怎么办呢?

对于第一种方法来说,用一条静态的SQL语句是很难办到的,因为需要多少条case when 必须要动态来决定,而Pivot也不支持如下写法…

SQL> select * from
  2  ( select dept, region, employee_count from dept_emp_distribute)
  3  pivot
  4  ( max(employee_count) as employee_count
  5    for region in ( select distinct region from dept_emp_distribute)
  6  );


  for region in ( select distinct region from dept_emp_distribute)
                  *
ERROR at line 5:
ORA-00936: missing expression

 

虽然如此,但是11g支持Pivot XML来满足这种需求,如下

SQL> select * from
  2  ( select dept, region, employee_count from dept_emp_distribute)
  3  pivot xml
  4  ( max(employee_count) as employee_count
  5    for region in (select distinct region from dept_emp_distribute)
  6  );

 

      DEPT REGION_XML
---------- ----------------------------------------------------------------------------------
         1 <PivotSet><item><column name = "REGION">HZ</column><column name = "EMPLOYEE_COUN
         2 <PivotSet><item><column name = "REGION">HZ</column><column name = "EMPLOYEE_COUN

 

 

 

2. Columns to Rows.

 

列转行叫“Unpivot”,相对于“Pivot”来说稍微有点复杂,而且不是那么直观。现在来把上面得到的结果返回去得到原表的结构,该怎么办呢?

先来看看11g中的UNPIVOT是怎么用的。

为方便演示,首先用CTAS(create table as select) 来创建一个中间表staging_table, 如下

SQL> create table staging_table as
  2  select * from
  3  ( select dept, region, employee_count from dept_emp_distribute)
  4  pivot
  5  ( max(employee_count) as employee_count
  6    for region in ('SZ' as SZ,
  7  'SH' as SH, 'HZ' as HZ)
  8  );

Table created.

SQL> desc staging_table;
Name                                      Null?    Type
----------------------------------------- -------- ------------------------
DEPT                                                             NUMBER
SZ_EMPLOYEE_COUNT                                  NUMBER
SH_EMPLOYEE_COUNT                                  NUMBER
HZ_EMPLOYEE_COUNT                                  NUMBER

SQL> select * from staging_table;

      DEPT SZ_EMPLOYEE_COUNT SH_EMPLOYEE_COUNT HZ_EMPLOYEE_COUNT
---------- ----------------- ----------------- -----------------
         1                10                15                25
         2                15                35                 5

看看如何用UNPIVOT来做,

  1  select * from
  2  (
  3      (   select dept,
  4               sz_employee_count   SZ,
  5               sh_employee_count   SH,
  6               hz_employee_count   HZ
  7          from  staging_table
  8      )
  9     unpivot
10    (
11       employee_count
12       for region in (SZ, SH, HZ)
13     )
14* )
SQL> /

      DEPT RE EMPLOYEE_COUNT
---------- -- --------------
         1 SZ             10
         1 SH             15
         1 HZ             25
         2 SZ             15
         2 SH             35
         2 HZ              5

6 rows selected.

看看UNPIVOT部分, employee_count 和 region可以随便命名,对应于最后的列名。 for region in (SZ, SH, HZ)中的SZ, SH, HZ 对应于select部分中列的别名。

unpivot
(
       employee_count
       for region in (SZ, SH, HZ)
)

现在回到11g之前,没有数据库对UNPIVOT的直接支持,该怎么办呢?

很容易想到的可以用UNION ALL来实现,虽然有点 cumbersome, 不是很灵活,

SQL> select * from
  2  (
  3     select dept, 'SZ' as region, sz_employee_count as employee_count from staging_table
  4     union all
  5     select dept, 'SH' as region, sh_employee_count as employee_count from staging_table
  6     union all
  7     select dept, 'HZ' as region, hz_employee_count as employee_count from staging_table
  8  )
  9  order by dept;

      DEPT RE EMPLOYEE_COUNT
---------- -- --------------
         1 SZ             10
         1 SH             15
         1 HZ             25
         2 SH             35
         2 HZ              5
         2 SZ             15

6 rows selected.

另外一种稍微复杂点但是更灵活的做法是借助collection, 如下

SQL> create type name_value_pair as object
  2  ( name varchar2(10),
  3    value number
  4  );
  5  /

Type created.

SQL> create type name_value_varray as
  2  varray(10) of name_value_pair;
  3  /

Type created.

 

 

SQL> select dept,
  2   vals.name as region,
  3   vals.value as employee_count
  4  from
  5   staging_table t,
  6   table
  7   ( name_value_array
  8      (name_value_pair('SZ', t.sz_employee_count),
  9       name_value_pair('SH', t.sh_employee_count),
10       name_value_pair('HZ', t.hz_employee_count))
11   ) vals
12  order by
13    dept, region;

 

      DEPT REGION     EMPLOYEE_COUNT
---------- ---------- --------------
         1 HZ                     25
         1 SH                     15
         1 SZ                     10
         2 HZ                      5
         2 SH                     35
         2 SZ                     15

6 rows selected.

这里面的亮点部分我觉得在于

staging_table t,

table
    ( name_value_array
        (name_value_pair('SZ', t.sz_employee_count),
         name_value_pair('SH', t.sh_employee_count),
        name_value_pair('HZ', t.hz_employee_count))
    ) vals

对于staging_table每一行(每个dept) 来动态生成一个表(name, value), 然后再取出最后得到的结果。猜测:每个dept跟它相对应的“动态表”做笛卡尔乘积。但是从对应的执行计划可以看出,没有看到笛卡尔乘积,而只是nested loops join。

执行计划如下,

Execution Plan
----------------------------------------------------------
Plan hash value: 402397393

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |               | 16336 |   861K|       |   279   (1)| 00:00:04 |
|   1 |  SORT ORDER BY                          |               | 16336 |   861K|  1944K|   279   (1)| 00:00:04 |
|   2 |   NESTED LOOPS                          |               | 16336 |   861K|       |    59   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL                    | STAGING_TABLE |     2 |   104 |       |     3   (0)| 00:00:01 |
|   4 |    COLLECTION ITERATOR CONSTRUCTOR FETCH|               |       |       |       |            |          |
-----------------------------------------------------------------------------------------------------------------

 

注意操作--- COLLECTION ITERATOR CONSTRUCTOR FETCH

 

原文地址:https://www.cnblogs.com/fangwenyu/p/1633316.html