hive中，行转列，json组解析

hive中常规处理json数据，array类型json用get_json_object(#,"$.#")这个方法足够了，map类型复合型json就需要通过数据处理才能解析。

explode：字段行转列，处理map结构的字段，将数组转换成多行

select explode(split(字段,',')) as abc from explode_lateral_view;

select explode(split(字段,',')) as abc from explode_lateral_view;

LATERAL VIEW：

1.Lateral View 用于和UDTF函数【explode,split】结合来使用。

2.首先通过UDTF函数将数据拆分成多行，再将多行结果组合成一个支持别名的虚拟表。

3..主要解决在select使用UDTF做查询的过程中查询只能包含单个UDTF，不能包含其它字段以及多个UDTF的情况。

4.语法：LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)

侧视图的意义是配合explode（或者其他的UDTF），一个语句生成把单行数据拆解成多行后的数据结果集。

select get_json_object(concat('{',sale_info_r,'}'),'$.monthSales') as monthSales from explode_lateral_view 
LATERAL VIEW explode(split(regexp_replace(regexp_replace(sale_info,'\[\{',''),'}]',''),'},\{'))sale_info as sale_info_r;

统一版

通过下面的句子，把这个json格式的一行数据，完全转换成二维表的方式展现

select t1.id ,get_json_object(col,'$.key') as value ,get_json_object(col,'$.key') as value
from 
(select id,s.col as col from table_a
lateral view explode(split(regexp_replace(regexp_extract(json,'^\[(.+)\]$',1),'\}\,|[, ]{0,1}\{', '\}\|\|\{'),'\|\|')) s as col ) t1

或者另一版本

select get_json_object(concat('{',sale_info_1,'}'),'$.source') as source,
     get_json_object(concat('{',sale_info_1,'}'),'$.monthSales') as monthSales,
     get_json_object(concat('{',sale_info_1,'}'),'$.userCount') as monthSales,
     get_json_object(concat('{',sale_info_1,'}'),'$.score') as monthSales
  from explode_lateral_view 

LATERAL VIEW explode(split(regexp_replace(regexp_replace(sale_info,'\[\{',''),'}]',''),'},\{'))sale_info as sale_info_1

hive 多行转多列，并且一一对应。

如下数据，主要包括三列，分别是班级、姓名以及成绩，数据表名是default.classinfo。

select 
    class,
    student_name,
    student_score
from
    default.classinfo

    lateral view posexplode(split(student,',')) sn as
        student_index_sn,student_name
    lateral view posexplode(split(score,',')) sc as 
        student_index_sc,student_score
where 
    student_index_sn = student_index_sc

行转列：collect_list(不去重) collect_set(去重)

合并多列：concat_ws 与 concat

CONCAT（）函数用于将多个字符串连接成一个字符串

指定参数之间的分隔符CONCAT_WS（）

GROUP_CONCAT函数返回一个字符串结果，该结果由分组中的值连接组合而成

hive 数据转成json数据组

concat('{"name":"',name,'","cus_nam":"',NVL(t2.cus_nam, ''),
'","orderNo":"',
NVL(orderNo, ''),
'","ord_no":"',
NVL(t1.ord_no, ''),
'","trigger":"',
NVL(trigger, ''),
'","assignmentOfClaims":"',
NVL(assignmentOfClaims, ''),
'"}') as value

通过get_json_object函数解析，测试无误

hive 正则匹配

regexp_extract（字段，正则表达式，序号）

匹配样例

select regexp_extract('honey123moon', 'hon([0-9]+)(moon)', 0)
select regexp_extract('x=a3&x=18abc&x=2&y=3&x=4','x=([0-9]+)([a-z]+)',1)

其他：

hive高阶函数工具：窗口函数

天才是百分之一的灵感，加百分之九十九的汗水，但那百分之一的灵感往往比百分之九十九的汗水来的重要