HIVE udf实例

本例中udf来自《hive编程指南》其中13章自定义函数中一个例子。

按照步骤,第一步,建立一个项目,创建 GenericUDFNvl 类。

/**
 * 不能接受第一个参数为null的情况
* 测试过,不是很好用
*/ package hive.udf; import org.apache.hadoop.hive.ql.exec.Description; import org.apache.hadoop.hive.ql.exec.UDFArgumentException; import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException; import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException; import org.apache.hadoop.hive.ql.metadata.HiveException; import org.apache.hadoop.hive.ql.udf.generic.GenericUDF; import org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils; import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; @Description( name = "nvl", value = "_FUNC_(value,default_value) - Returns default value if value is nul else returns value", extended = "Example: > SELECT _FUNC_(NULL, 'bla') FROM src LIMIT 1;" ) public class GenericUDFNvl extends GenericUDF { private GenericUDFUtils.ReturnObjectInspectorResolver returnOIResolver; private ObjectInspector[] argumentOIs; @Override public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException { argumentOIs = arguments; if (arguments.length != 2) { throw new UDFArgumentLengthException("The operator 'NVL' accepts 2 arguments."); } returnOIResolver = new GenericUDFUtils.ReturnObjectInspectorResolver(true); if (!(returnOIResolver.update(arguments[0]) && returnOIResolver.update(arguments[1]))) { throw new UDFArgumentTypeException(2, "THe 1st and 2nd args of function NVL should have the same type, " + "but they are different: "" + arguments[0].getTypeName() + "" and "" + arguments[1].getTypeName() + """); } return returnOIResolver.get(); } @Override public Object evaluate(DeferredObject[] arguments) throws HiveException { Object retVal = returnOIResolver.convertIfNecessary("", argumentOIs[0]); //if (retVal == null) { retVal = returnOIResolver.convertIfNecessary(arguments[1], argumentOIs[1]); //} return retVal; } @Override public String getDisplayString(String[] children) { StringBuilder sb = new StringBuilder(); sb.append("if "); sb.append(children[0]); sb.append(" is null "); sb.append("returns"); sb.append(children[1]); return sb.toString(); } }

创建完成之后,在项目中点右键->Export->JAR file,再下一步中选中刚刚创建的这个文件,将文件导出为.jar文件。

接下来,进入hive的 CLI,执行

hive> add jar /home/user/udfnvl.jar;  

hive> create temporary function nvl as "hive.udf.GenericUDFNvl";

hive> desc function nvl;
OK
nvl(value,default_value) - Returns default value if value is nul else returns value
Time taken: 0.169 seconds
hive> desc function extended nvl;
OK
nvl(value,default_value) - Returns default value if value is nul else returns value
Example:
> SELECT nvl(NULL, 'bla') FROM src LIMIT 1;

Time taken: 0.051 seconds

以上的整个过程比较简单,有很多UDF的例子,可以在github中找到,如https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEncode.java

但也有一些需要注意的地方,就是导出项目jar包时需要关注一下jdk的版本,需要与执行环境一致,否则会报 Unsupported major.minor version 52.0 这样的错误。

原文地址:https://www.cnblogs.com/gromit409/p/7688458.html