SQL Techniques – Columns to Rows, Rows to Columns

Abstract: 行变列，列变行在生成报表的时候经常遇到，行变列叫做"Pivot”, 反之叫做"Unpivot”。在Oracle11g之前，一般都是通过很多work around来实现，但是Oracle11g中直接支持PIVOT和UNPIVOT的操作。

关于PIVOT和UNPIVOT用法可以参见这里

1. Rows to Columns

Rows to Columns是比较常见的操作，比如说下面这个例子，有这样一张表，

SQL> desc dept_emp_distribute;
Name                                      Null?    Type
----------------------------------------- -------- ----------------------------
DEPT                                                  NUMBER
REGION                                              VARCHAR2(20)
EMPLOYEE_COUNT                             NUMBER

SQL> select * from dept_emp_distribute;

      DEPT REGION               EMPLOYEE_COUNT
---------- -------------------- --------------
         1 SZ                               10
         1 SH                               15
         1 HZ                               25
         2 HZ                                5
         2 SZ                               15
         2 SH                               35

6 rows selected.

现在假设我们要生成如下这张报表，

      DEPT EMPLOYEE_IN_SZ EMPLOYEE_IN_SH EMPLOYEE_IN_HZ
---------- -------------- -------------- --------------
         1             10             15             25
         2             15             35              5

在11g之前，最容易想到的方法，就是采用case when 或 decode来做，如下所示，

SQL> select dept,
2    max(case region when 'SZ' then employee_count else null end) employee_in_sz,
3    max(case region when 'SH' then employee_count else null end) employee_in_sh,
4    max(case region when 'HZ' then employee_count else null end) employee_in_hz
5 from
6    dept_emp_distribute
7 group by
8    dept;

11g中可以用PIVOT来写SQL，如下，

SQL> select * from
2 ( select dept, region, employee_count from dept_emp_distribute)
3 pivot
4 (
5 max(employee_count) as employee_count
6 for region in ('SZ' as SZ, 'SH' as SH, 'HZ' as HZ)
7 );

      DEPT SZ_EMPLOYEE_COUNT SH_EMPLOYEE_COUNT HZ_EMPLOYEE_COUNT
---------- ----------------- ----------------- -----------------
         1                10                15                25
         2                15                35                 5

注意Pivot需要一个聚集函数，因此用了max(employee_count),如果去掉这个max(),会得到如下错误：

ORA-56902: expect aggregate function inside pivot operation

Execution Plan
----------------------------------------------------------
Plan hash value: 2443834872

-------------------------------------------------------------------------------------------
| Id | Operation           | Name                | Rows | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                     |     6 |   228 |     4 (25)| 00:00:01 |
|   1 | HASH GROUP BY PIVOT|                     |     6 |   228 |     4 (25)| 00:00:01 |
|   2 |   TABLE ACCESS FULL | DEPT_EMP_DISTRIBUTE |     6 |   228 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

对应的Pivot操作符的执行操作是HASH GROUP BY PIVOT.

这两种方法虽然能满足需求，但是显然不够灵活，因为Hard code了REGION在SQL语句中，如果REGION的信息不确定怎么办呢？

对于第一种方法来说，用一条静态的SQL语句是很难办到的，因为需要多少条case when 必须要动态来决定，而Pivot也不支持如下写法…

SQL> select * from
2 ( select dept, region, employee_count from dept_emp_distribute)
3 pivot
4 ( max(employee_count) as employee_count
5 for region in ( select distinct region from dept_emp_distribute)
6 );

for region in ( select distinct region from dept_emp_distribute)
*
ERROR at line 5:
ORA-00936: missing expression

虽然如此，但是11g支持Pivot XML来满足这种需求，如下

SQL> select * from
2 ( select dept, region, employee_count from dept_emp_distribute)
3 pivot xml
4 ( max(employee_count) as employee_count
5 for region in (select distinct region from dept_emp_distribute)
6 );

      DEPT REGION_XML
---------- ----------------------------------------------------------------------------------
         1 <PivotSet><item><column name = "REGION">HZ</column><column name = "EMPLOYEE_COUN
         2 <PivotSet><item><column name = "REGION">HZ</column><column name = "EMPLOYEE_COUN

2. Columns to Rows.

列转行叫“Unpivot”，相对于“Pivot”来说稍微有点复杂，而且不是那么直观。现在来把上面得到的结果返回去得到原表的结构，该怎么办呢？

先来看看11g中的UNPIVOT是怎么用的。

为方便演示，首先用CTAS（create table as select) 来创建一个中间表staging_table, 如下

SQL> create table staging_table as
2 select * from
3 ( select dept, region, employee_count from dept_emp_distribute)
4 pivot
5 ( max(employee_count) as employee_count
6 for region in ('SZ' as SZ,
7 'SH' as SH, 'HZ' as HZ)
8 );

Table created.

SQL> desc staging_table;
Name                                      Null?    Type
----------------------------------------- -------- ------------------------
DEPT                                                             NUMBER
SZ_EMPLOYEE_COUNT                                  NUMBER
SH_EMPLOYEE_COUNT                                  NUMBER
HZ_EMPLOYEE_COUNT                                  NUMBER

SQL> select * from staging_table;

看看如何用UNPIVOT来做，

1 select * from
2 (
3      (   select dept,
4               sz_employee_count   SZ,
5               sh_employee_count   SH,
6               hz_employee_count   HZ
7          from staging_table
8      )
9     unpivot
10    (
11       employee_count
12       for region in (SZ, SH, HZ)
13     )
14* )
SQL> /

      DEPT RE EMPLOYEE_COUNT
---------- -- --------------
         1 SZ             10
         1 SH             15
         1 HZ             25
         2 SZ             15
         2 SH             35
         2 HZ              5

6 rows selected.

看看UNPIVOT部分， employee_count 和 region可以随便命名，对应于最后的列名。 for region in (SZ, SH, HZ)中的SZ， SH， HZ 对应于select部分中列的别名。

unpivot
(
employee_count
for region in (SZ, SH, HZ)
)

现在回到11g之前，没有数据库对UNPIVOT的直接支持，该怎么办呢？

很容易想到的可以用UNION ALL来实现，虽然有点 cumbersome，不是很灵活，

SQL> select * from
2 (
3     select dept, 'SZ' as region, sz_employee_count as employee_count from staging_table
4     union all
5     select dept, 'SH' as region, sh_employee_count as employee_count from staging_table
6     union all
7     select dept, 'HZ' as region, hz_employee_count as employee_count from staging_table
8 )
9 order by dept;

      DEPT RE EMPLOYEE_COUNT
---------- -- --------------
         1 SZ             10
         1 SH             15
         1 HZ             25
         2 SH             35
         2 HZ              5
         2 SZ             15

6 rows selected.

另外一种稍微复杂点但是更灵活的做法是借助collection，如下

SQL> create type name_value_pair as object
2 ( name varchar2(10),
3 value number
4 );
5 /

Type created.

SQL> create type name_value_varray as
2 varray(10) of name_value_pair;
3 /

Type created.

SQL> select dept,
2   vals.name as region,
3   vals.value as employee_count
4 from
5   staging_table t,
6   table
7   ( name_value_array
8      (name_value_pair('SZ', t.sz_employee_count),
9       name_value_pair('SH', t.sh_employee_count),
10       name_value_pair('HZ', t.hz_employee_count))
11   ) vals
12 order by
13    dept, region;

      DEPT REGION     EMPLOYEE_COUNT
---------- ---------- --------------
         1 HZ                     25
         1 SH                     15
         1 SZ                     10
         2 HZ                      5
         2 SH                     35
         2 SZ                     15

6 rows selected.

这里面的亮点部分我觉得在于

staging_table t,

table
    ( name_value_array
        (name_value_pair('SZ', t.sz_employee_count),
         name_value_pair('SH', t.sh_employee_count),
        name_value_pair('HZ', t.hz_employee_count))
    ) vals

对于staging_table每一行(每个dept) 来动态生成一个表（name, value), 然后再取出最后得到的结果。猜测：每个dept跟它相对应的“动态表”做笛卡尔乘积。但是从对应的执行计划可以看出，没有看到笛卡尔乘积，而只是nested loops join。

执行计划如下，

Execution Plan
----------------------------------------------------------
Plan hash value: 402397393

-----------------------------------------------------------------------------------------------------------------
| Id | Operation                               | Name          | Rows | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |               | 16336 |   861K|       |   279   (1)| 00:00:04 |
|   1 | SORT ORDER BY                          |               | 16336 |   861K| 1944K|   279   (1)| 00:00:04 |
|   2 |   NESTED LOOPS                          |               | 16336 |   861K|       |    59   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL                    | STAGING_TABLE |     2 |   104 |       |     3   (0)| 00:00:01 |
|   4 |    COLLECTION ITERATOR CONSTRUCTOR FETCH|               |       |       |       |            |          |
-----------------------------------------------------------------------------------------------------------------

注意操作--- COLLECTION ITERATOR CONSTRUCTOR FETCH