Hadoop安装lzo实验

参考http://blog.csdn.net/lalaguozhe/article/details/10912527

环境:hadoop2.3cdh5.0.2

hive 1.2.1

目标:安装lzo 测试作业运行与hive表创建使用lzo格式存储

之前安装试用snappy的时候,发现cdh解压后的native中已经包含了libsnappy之类的本地库,但是没有包含lzo.

所以lzo的使用,除了要安装lzo程序之外,还要编译安装hadoop-lzo.

1.安装lzo.可以yum安装,也可以根据上面提供的链接自己下载源码编译安装。

2。git 下载安装hadoop-lzo,编译安装

git clone https://github.com/twitter/hadoop-lzo.git​
export CFLAGS=-m64 
export CXXFLAGS=-m64
mvn clean package -Dmaven.test.skip=true 

cp Linux-amd64-64/lib /app/cdh23502/lib/native/
cp hadoop-lzo-0.4.20-SNAPSHOT.jar /app/cdh23502/share/hadoop/common/

我编译的时候遇到的问题是mave库的url域名解析失败了二三次,我就多次尝试了几次,编译一般没问题。

把native和jar包放置到合适的地方,并分发到集群中。

因为之前已经配置了使用snappy,所以只需要把改动两种即可,

一是core-site.xml中添加lzo的:org.apache.hadoop.io.compress.Lz4Codec,com.hadoop.compression.lzo.LzopCodec

<property>
      <name>io.compression.codecs</name>
      <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.Lz4Codec,com.hadoop.compression.lzo.LzopCodec</value>
          <description>A comma-separated list of the compression codec classes that can
            be used for compression/decompression. In addition to any classes specified
              with this property (which take precedence), codec classes on the classpath
                are discovered using a Java ServiceLoader.</description>
    </property>

二是mapred-site.xml中把snappy的替换为:

<property>  
    <name>mapred.compress.map.output</name>  
    <value>true</value>  
</property>  
<property>  
    <name>mapred.map.output.compression.codec</name>  
    <value>com.hadoop.compression.lzo.LzoCodec</value>  
</property>

创建hive表:

seq 1 100 > nums.txt

hive -e "create table nums(num int) row format delimited stored as textfile;"

hive -e "load data local inpath '/yourpath/nums.txt' overwrite into table nums;"

然后

CREATE TABLE lzo_test(  
 col String  
)  
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

insert into table lzo_test(col) select num from nums;

select count(*) from lzo_test;

原文地址:https://www.cnblogs.com/huaxiaoyao/p/5152818.html