Hive高级聚合GROUPING SETS,ROLLUP以及CUBE


scala> import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.HiveContext

scala> val hcon=new HiveContext(sc)
warning: there was one deprecation warning; re-run with -deprecation for details
hcon: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@dd102ea

scala> hcon.sql("select age,sex,count(1) from gamedw.customers group by age,sex").show
+---+---+--------+
|age|sex|count(1)|
+---+---+--------+
| 56| 0| 7|
| 32| 1| 7|
| 20| 1| 7|
| 50| 1| 7|
| 5| 1| 4|
| 47| 0| 7|
| 85| 1| 7|
|100| 0| 5|
+---+---+--------+

scala> hcon.sql("select age,sex,count(1) from gamedw.customers group by age,sex grouping sets((age,sex),sex,())").show
+----+----+--------+
| age| sex|count(1)|
+----+----+--------+
| 56| 0| 7|
|null| 1| 32|
| 20| 1| 7|
|null|null| 51|
| 32| 1| 7|
| 5| 1| 4|
| 85| 1| 7|
| 47| 0| 7|
| 100| 0| 5|
|null| 0| 19|
| 50| 1| 7|
+----+----+--------+

GROUPING SETS

在一个GROUP BY查询中,根据不同的维度组合进行聚合,等价于将不同维度的GROUP BY结果集进行UNION ALL,SETS的子句中如果包含()数据集,则表示整体聚合

scala> hcon.sql("select age,sex,count(1) from gamedw.customers group by age,sex grouping sets((age,sex),sex,()) order by age,sex").show
+----+----+--------+
| age| sex|count(1)|
+----+----+--------+
|null|null| 51|
|null| 0| 19|
|null| 1| 32|
| 5| 1| 4|
| 20| 1| 7|
| 32| 1| 7|
| 47| 0| 7|
| 50| 1| 7|
| 56| 0| 7|
| 85| 1| 7|
| 100| 0| 5|
+----+----+--------+

scala> hcon.sql("select age,sex,count(1) from gamedw.customers group by age,sex grouping sets((age,sex),sex,age,()) order by age,sex").show
+----+----+--------+
| age| sex|count(1)|
+----+----+--------+
|null|null| 51|
|null| 0| 19|
|null| 1| 32|
| 5|null| 4|
| 5| 1| 4|
| 20|null| 7|
| 20| 1| 7|
| 32|null| 7|
| 32| 1| 7|
| 47|null| 7|
| 47| 0| 7|
| 50|null| 7|
| 50| 1| 7|
| 56|null| 7|
| 56| 0| 7|
| 85|null| 7|
| 85| 1| 7|
| 100|null| 5|
| 100| 0| 5|
+----+----+--------+

CUBE

根据GROUP BY的维度的所有组合进行聚合。

scala> hcon.sql("select age,sex,count(1) from gamedw.customers group by age,sex with cube order by age,sex").show
+----+----+--------+
| age| sex|count(1)|
+----+----+--------+
|null|null| 51|
|null| 0| 19|
|null| 1| 32|
| 5|null| 4|
| 5| 1| 4|
| 20|null| 7|
| 20| 1| 7|
| 32|null| 7|
| 32| 1| 7|
| 47|null| 7|
| 47| 0| 7|
| 50|null| 7|
| 50| 1| 7|
| 56|null| 7|
| 56| 0| 7|
| 85|null| 7|
| 85| 1| 7|
| 100|null| 5|
| 100| 0| 5|
+----+----+--------+

ROLLUP

是CUBE的子集,以最左侧的维度为主,从该维度进行层级聚合。

scala> hcon.sql("select age,sex,count(1) from gamedw.customers group by age,sex with rollup order by age,sex").show
+----+----+--------+
| age| sex|count(1)|
+----+----+--------+
|null|null| 51|
| 5|null| 4|
| 5| 1| 4|
| 20|null| 7|
| 20| 1| 7|
| 32|null| 7|
| 32| 1| 7|
| 47|null| 7|
| 47| 0| 7|
| 50|null| 7|
| 50| 1| 7|
| 56|null| 7|
| 56| 0| 7|
| 85|null| 7|
| 85| 1| 7|
| 100|null| 5|
| 100| 0| 5|
+----+----+--------+

原文地址:https://www.cnblogs.com/playforever/p/9336445.html