Hive调用python脚本

python脚本如下:

#!/bin/env python
# -*- coding=utf-8 -*-
import sys
import datetime

d_user = {
   "user1": "true",
   "user2": "true"
}

for line in sys.stdin:
   line = line.strip()
   userid = line.split('	')[0]
   if d_user.get(userid, "false") == "true":
       print "	".join([userid, "1"])

hive执行添加文件命令如下

$ hive
hive> add file /home/user/test.py

hql命令如下

select userid, sum(1)
from(
select
TRANSFORM (user_pin)
USING '/home/user/test.py'
AS userid, cnt
from hive_table
where dt = "2021-03-01"
)a
group by userid
原文地址:https://www.cnblogs.com/cfox/p/14677832.html