hive函数总结及应用场景

1. COALESCE

返回参数中的第一个非空值;如果所有值都为NULL,那么返回NULL

用途:

如果该表为商品表(维度表),以flink解析mysql binlog同步数据过来,需要将昨日的存量数据tb_sku_dim_all,今日增量tb_sku_dim_incre结合起来。

1)首先会对增量表tb_sku_dim_incre按skuOID进行分组,取最近时间的那条数据;

2)然后再与历史表tb_sku_dim_all,进行full outer join,如果增量表中有对应数据,则先使用增量表中的字段,没有则使用全量表中的字段,即COALESCE(t2.updateDT, t1.updateDT),举例如下:

SELECT 
COALESCE( t2.oid, t1.oid ) AS oid,
COALESCE( t2.skuName, t1.skuName ) AS skuName,
COALESCE( t2.stock, t1.stock ) AS stock,
COALESCE( t2.cstatus, t1.cstatus ) AS cstatus,
COALESCE( t2.updateDT, t1.updateDT ) AS updateDT
FROM
ods.tb_sku_dim_all t1
FULL OUTER JOIN (
    SELECT
    oid,
    skuName,
    stock,
    cstatus,
    updateDT
    FROM
    (
        SELECT
        oid,
        skuName,
        stock,
        cstatus,
        updateDT,
        row_number ( ) over ( PARTITION BY id ORDER BY event_time DESC ) AS rank 
        FROM
        tb_sku_dim_incre
        WHERE dt = '20211209' 
    ) temp 
    WHERE rank = 1 
) t2 
ON t1.oid = t2.oid;
原文地址:https://www.cnblogs.com/654wangzai321/p/15667379.html