hive + hadoop 环境搭建

机器规划:

主机 ip 进程
hadoop1 10.183.225.158 hive server
hadoop2 10.183.225.166 hive client

前置条建:

kerberos部署:http://www.cnblogs.com/kisf/p/7473193.html

Hadoop  HA + kerberos部署:http://www.cnblogs.com/kisf/p/7477440.html

mysql安装:略

添加hive用户名,及数据库。mysql -uhive -h10.112.28.179 -phive123456

hive使用2.3.0版本:

wget http://mirror.bit.edu.cn/apache/hive/hive-2.3.0/apache-hive-2.3.0-bin.tar.gz

添加环境变量:

export HIVE_HOME=/letv/soft/apache-hive-2.3.0-bin
export HIVE_CONF_DIR=$HIVE_HOME/conf
export PATH=$PATH:$HIVE_HOME/bin

同步至master2,并 source /etc/profile

解压:  

tar zxvf apache-hive-2.3.0-bin.tar.gz

  

kerberos生成keytab:

addprinc -randkey hive/hadoop1@JENKIN.COM
addprinc -randkey hive/hadoop2@JENKIN.COM

xst -k /var/kerberos/krb5kdc/keytab/hive.keytab hive/hadoop1@JENKIN.COM
xst -k /var/kerberos/krb5kdc/keytab/hive.keytab hive/hadoop2@JENKIN.COM

  

拷贝至hadoop2

scp /var/kerberos/krb5kdc/keytab/hive.keytab hadoop1:/var/kerberos/krb5kdc/keytab/
scp /var/kerberos/krb5kdc/keytab/hive.keytab hadoop2:/var/kerberos/krb5kdc/keytab/

(使用需要kinit)  

hive server 配置:

hive server hive-env.sh增加:  

HADOOP_HOME=/xxx/soft/hadoop-2.7.3
export HIVE_CONF_DIR=/xxx/soft/apache-hive-2.3.0-bin/conf
export HIVE_AUX_JARS_PATH=/xxx/soft/apache-hive-2.3.0-bin/lib

  

hive server上增加hive-site.xml:

<configuration>
    <property>
           <name>hive.metastore.schema.verification</name>
           <value>false</value>
           <description>
              Enforce metastore schema version consistency.
                  True: Verify that version information stored in metastore matches with one from Hive jars.  Also disable automatic
                        schema migration attempt. Users are required to manully migrate schema after Hive upgrade which ensures
                        proper metastore schema migration. (Default)
                  False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
            </description>
    </property>
    <property>
            <name>hive.metastore.warehouse.dir</name>
            <value>/user/hive/warehouse</value>
            <description>location of default database for the warehouse</description>
    </property>
    <property>
            <name>hive.querylog.location</name>
            <value>/xxx/soft/apache-hive-2.3.0-bin/log</value>
            <description>Location of Hive run time structured log file</description>
    </property>
    <property>
            <name>hive.downloaded.resources.dir</name>
            <value>/xxx/soft/apache-hive-2.3.0-bin/tmp</value>
            <description>Temporary local directory for added resources in the remote file system.</description>
    </property>
    <property>
            <name>javax.jdo.option.ConnectionURL</name>
            <value>jdbc:mysql://10.112.28.179:3306/hive?createDatabaseIfNotExist=true&iuseUnicode=true&characterEncoding=utf-8&useSSL=false</value<configuration>
            <description>JDBC connect string for a JDBC metastore</description>
    </property>
    <property>
            <name>javax.jdo.option.ConnectionDriverName</name>
            <value>com.mysql.jdbc.Driver</value>
            <description>Driver class name for a JDBC metastore</description>
    </property>

    <property>
            <name>javax.jdo.option.ConnectionUserName</name>
            <value>hive</value>
            <description>username to use against metastore database</description>
    </property>
    <property>
            <name>javax.jdo.option.ConnectionPassword</name>
            <value>hive123456</value>
            <description>password to use against metastore database</description>
    </property>
<!-- kerberos config -->
    <property>
        <name>hive.server2.authentication</name>
        <value>KERBEROS</value>
    </property>
    <property>
        <name>hive.server2.authentication.kerberos.principal</name>
        <value>hive/_HOST@JENKIN.COM</value>
    </property>
    <property>
        <name>hive.server2.authentication.kerberos.keytab</name>
        <value>/var/kerberos/krb5kdc/keytab/hive.keytab</value>
        <!-- value>/xxx/soft/apache-hive-2.3.0-bin/conf/keytab/hive.keytab</value -->
    </property>

    <property>
        <name>hive.metastore.sasl.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.metastore.kerberos.keytab.file</name>
        <value>/var/kerberos/krb5kdc/keytab/hive.keytab</value>
    </property>
    <property>
        <name>hive.metastore.kerberos.principal</name>
        <value>hive/_HOST@JENKIN.COM</value>
    </property>

  

hadoop namenode core-site.xml增加配置:

<!-- hive congfig  -->
        <property>
                <name>hadoop.proxyuser.hive.hosts</name>
                <value>*</value>
        </property>
        <property>
                <name>hadoop.proxyuser.hive.groups</name>
                <value>*</value>
        </property>
        <property>
                <name>hadoop.proxyuser.hdfs.hosts</name>
                <value>*</value>
        </property>
        <property>
                <name>hadoop.proxyuser.hdfs.groups</name>
                <value>*</value>
        </property>
        <property>
                <name>hadoop.proxyuser.HTTP.hosts</name>
                <value>*</value>
        </property>
        <property>
                <name>hadoop.proxyuser.HTTP.groups</name>
                <value>*</value>
         </property>

  同步是其他机器。

scp etc/hadoop/core-site.xml master2:/xxx/soft/hadoop-2.7.3/etc/hadoop/
scp etc/hadoop/core-site.xml slave2:/xxx/soft/hadoop-2.7.3/etc/hadoop/

  

JDBC下载:

wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.44.tar.gz
tar zxvf mysql-connector-java-5.1.44.tar.gz 

复制到hive lib目录:

cp mysql-connector-java-5.1.44/mysql-connector-java-5.1.44-bin.jar apache-hive-2.3.0-bin/lib/

客户端配置:

将hive拷贝至hadoop2

scp -r apache-hive-2.3.0-bin/ hadoop2:/xxx/soft/

  

在hadoop2上(client):

hive-site.xml

<configuration>
    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://hadoop1:9083</value>
    </property>
    <property>
         <name>hive.metastore.local</name>
         <value>false</value>
    </property>
    <!-- kerberos config -->
    <property>
        <name>hive.server2.authentication</name>
        <value>KERBEROS</value>
    </property>
    <property>
        <name>hive.server2.authentication.kerberos.principal</name>
        <value>hive/_HOST@JENKIN.COM</value>
    </property>
    <property>
        <name>hive.server2.authentication.kerberos.keytab</name>
        <value>/var/kerberos/krb5kdc/keytab/hive.keytab</value>
        <!-- value>/xxx/soft/apache-hive-2.3.0-bin/conf/keytab/hive.keytab</value -->
    </property>

    <property>
        <name>hive.metastore.sasl.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.metastore.kerberos.keytab.file</name>
        <value>/var/kerberos/krb5kdc/keytab/hive.keytab</value>
    </property>
    <property>
        <name>hive.metastore.kerberos.principal</name>
        <value>hive/_HOST@JENKIN.COM</value>
    </property>

</configuration>

  

启动hive:

初始化数据:

./bin/schematool -dbType mysql -initSchema

获取票据:

kinit -k -t /var/kerberos/krb5kdc/keytab/hive.keytab hive/hadoop1@JENKIN.COM

启动server:

hive --service metastore &  

 验证:

[root@hadoop1 conf]# netstat -nl | grep 9083
tcp        0      0 0.0.0.0:9083                0.0.0.0:*                   LISTEN  

  

ps -ef | grep metastore

hive

hive>

启动thrift (hive server)

hive --service hiveserver2 &

 

验证thrift(hive server是否启动) 

[root@hadoop1 conf]# netstat -nl | grep 10000
tcp        0      0 0.0.0.0:10000               0.0.0.0:*                   LISTEN  

  

hive客户端hql操作:

DDL参考:https://cwiki.apache.org//confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/Alter/UseDatabase

DML参考:https://cwiki.apache.org//confluence/display/Hive/LanguageManual+DML

通过hive建的database,tables, 在hdfs 上都能看到。参考hive-site.xml location配置。

hadoop fs -ls /usr/hive/warehouse

  

 beeline客户端连接hive:

beeline -u "jdbc:hive2://hadoop1:10000/;principal=hive/_HOST@JENKIN.COM"

执行sql:

0: jdbc:hive2://hadoop1:10000/> show databases;
+----------------+
| database_name  |
+----------------+
| default        |
| hivetest       |
+----------------+
2 rows selected (0.318 seconds)

  

hive> create database jenkintest;
OK
Time taken: 0.968 seconds
hive> show databases;
OK
default
hivetest
jenkintest
Time taken: 0.033 seconds, Fetched: 3 row(s)
hive> use jenkintest
    > ;
OK
Time taken: 0.108 seconds
hive> create table test1(columna int, columnb string);
OK
Time taken: 0.646 seconds
hive> show tables;
OK
test1
Time taken: 0.084 seconds, Fetched: 1 row(s)

  

 hive数据导入:(通过文件导入,在本地建立文件,列按“table”键分开) 

[root@hadoop2 ~]# vim jenkindb.txt
1       jenkin
2       jenkin.k
3       anne


[root@hadoop2 ~]#hive

hive> create table jenkintb (id int, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '	' STORED AS TEXTFILE;

hive> load data local inpath 'jenkindb.txt' into table jenkintb;

hive> select * from jenkintb;
OK
1       jenkin
2       jenkin.k
3       anne

  

show create table jenkintb;

  

  

原文地址:https://www.cnblogs.com/kisf/p/7497261.html