elephant-bird学习笔记

elephant-bird是Twitter的开源项目,项目的地址为 https://github.com/twitter/elephant-bird

该项目是Twitter为LZO,thrift,protocol buffer相关的hadoop InputFormats, OutputFormats, Writables, Pig加载函数, Hive SerDe, HBase二级索引等编写的库

mvn clean install -U -Dprotobuf.version=2.5.0 -DskipTests=true

mvn package的时候需要签名

gpg --gen-key

以及需要安装apache Thrift和Protocol Buffers

使用elephant-bird来建hive表的类型对应关系

CREATE EXTERNAL TABLE `xxxx`(
	  `ts` string COMMENT 'from deserializer', 
	  `schema` string COMMENT 'from deserializer', 
	  `test_string` string COMMENT 'from deserializer', 
	  `test_long` bigint COMMENT 'from deserializer', 
	  `test_int` int COMMENT 'from deserializer', 
	  `test_short` smallint COMMENT 'from deserializer', 
	  `test_double` double COMMENT 'from deserializer', 
	  `test_byte` tinyint COMMENT 'from deserializer', 
	  `test_bool` boolean COMMENT 'from deserializer', 
	  `test_list` array<string> COMMENT 'from deserializer', 
	  `test_set` array<bigint> COMMENT 'from deserializer', 
	  `test_map` map<string,int> COMMENT 'from deserializer')
	COMMENT 'test_all_type'
	PARTITIONED BY ( 
	  `ds` string COMMENT '日期分区')
	ROW FORMAT SERDE 
	  'org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer' 
	WITH SERDEPROPERTIES ( 
	  'serialization.class'='com.xxx.xxx.xxx', 
	  'serialization.format'='org.apache.thrift.protocol.TCompactProtocol') 
	STORED AS INPUTFORMAT 
	  'org.apache.hadoop.mapred.SequenceFileInputFormat' 
	OUTPUTFORMAT 
	  'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
	LOCATION
	  'hdfs://xxxxxxx'
	TBLPROPERTIES (
原文地址:https://www.cnblogs.com/tonglin0325/p/9636641.html