hive行转列和列转行

1.hive行转列

good01表内容
hive> select * from good01;
OK
1    2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
2    11,12,13,14,15,16
Time taken: 0.163 seconds, Fetched: 2 row(s)
hive> 


good01 列结构
hive> desc good01;
OK
id_one                  string                                      
new_id                  string                                      
Time taken: 0.046 seconds, Fetched: 2 row(s)
hive> 


行转列语句
select id_one,result from good01 lateral view explode(split(new_id,',')) new_id as result;
执行 结果为
OK
1    2
1    3
1    4
1    5
1    6
1    7
1    8
1    9
1    10
1    11
1    12
1    13
1    14
1    15
1    16
2    11
2    12
2    13
2    14
2    15
2    16
View Code

2.hive列转行

第一种,k  -   单 v形式 

表内容
hive> select * from good;
OK
1    2
1    3
1    4
1    5
1    6
1    7
1    8
1    9
1    10
1    11
1    12
1    13
1    14
1    15
1    16
2    11
2    12
2    13
2    14
2    15
2    16

表结构
ive> desc good;
OK
id_one                  string                                      
id_two                  string   

列转行语句
hive> select id_one,concat_ws(',',collect_list(id_two)) as value01 
    > from good group by id_one;



MapReduce Total cumulative CPU time: 4 seconds 420 msec
Ended Job = job_1572198119580_0017
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.42 sec HDFS Read: 9610 HDFS Write: 170 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 420 msec
OK
1 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
2 11,12,13,14,15,16
Time taken: 23.857 seconds, Fetched: 2 row(s)
hive>

第二种,列转行,k-n v

列转行
输出格式  k- nv

表内容:
hive> select * from tmp_dianpingt;
OK
344920    one    10
344920    two    30
344920    three    30
344920    four    30
Time taken: 0.135 seconds, Fetched: 4 row(s)

表结构形式:
ive> desc tmp_dianpingt;
OK
productid               string                                      
tagtype                 string                                      
highlight               string 

分隔符是   ,
建表语句是
hive> show create table tmp_dianpingt;
OK
CREATE TABLE `tmp_dianpingt`(
  `productid` string, 
  `tagtype` string, 
  `highlight` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
WITH SERDEPROPERTIES ( 
  'field.delim'=',', 
  'serialization.format'=',') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://min01:9000/user/hive/warehouse/dw_htl.db/tmp_dianpingt'
TBLPROPERTIES (
  'bucketing_version'='2', 
  'transient_lastDdlTime'='1572200512')
Time taken: 0.058 seconds, Fetched: 18 row(s)




执行 列转行  k  -nv 语句为
select productid
      ,concat_ws('',collect_set(fuwu))     `one`
      ,concat_ws('',collect_set(weizhi))   `two`
      ,concat_ws('',collect_set(sheshi))   `three`
      ,concat_ws('',collect_set(weisheng)) `four`
from
  (  select productid
           ,case when tagtype='one' then highlight else '' end as  weizhi
           ,case when tagtype='two' then highlight else '' end as  fuwu 
           ,case when tagtype='three' then highlight else '' end as  weisheng
           ,case when tagtype='four' then highlight else '' end as  sheshi
      from dw_htl.tmp_dianpingt
   ) a 
 group by productid





结果为:

apReduce Total cumulative CPU time: 4 seconds 270 msec
Ended Job = job_1572198119580_0016
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.27 sec HDFS Read: 12689 HDFS Write: 118 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 270 msec
OK
344920 30 10 30 30

RUSH B
原文地址:https://www.cnblogs.com/tangsonghuai/p/11748549.html