三台linux集群hadoop,在此上面运行hive

---恢复内容开始---

一,准备

先有三台linux,对hadoop集群的搭建。

eddy01:开启一个hdfs的老大namenode,yarn的老大ResourceManager其中进程包括(NodeManager,ResourceManager,NameNode,SecondaryNameNode

eddy02:(datanode,nodemanager)

eddy03:(datanode,nodemanager)

配置文件(只需要对eddy01中有这些配置,eddy02,eddy03都有hadoop,只需要在eddy01中的hosts文件标注其ip和主机名的映射并且在Hadoop的slaves文件中配置就可以,因为在eddy01中启动yarn即ResourceManager,它就会ssh到eddy02,eddy03中)

配置文件:

core-site.xml

<!--                                                                                             
  Licensed under the Apache License, Version 2.0 (the "License");                                
  you may not use this file except in compliance with the License.                               
  You may obtain a copy of the License at                                                        
                                                                                                 
    http://www.apache.org/licenses/LICENSE-2.0                                                   
                                                                                                 
  Unless required by applicable law or agreed to in writing, software                            
  distributed under the License is distributed on an "AS IS" BASIS,                              
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.                       
  See the License for the specific language governing permissions and                            
  limitations under the License. See accompanying LICENSE file.                                  
-->                                                                                              
                                                                                                 
<!-- Put site-specific property overrides in this file. -->                                      
                                                                                                 
<configuration>                                                                                  
        <!-- 指定HDFS老大(namenode)的通信地址 -->                                              
        <property>                                                                               
                <name>fs.defaultFS</name>                                                        
                <!-- <value>hdfs://ns1</value> -->                                               
                <value>hdfs://eddy01:9000</value>                                                
        </property>                                                                              
        <!-- 指定hadoop运行时产生文件的存储路径 -->                                              
        <property>                                                                               
                <name>hadoop.tmp.dir</name>                                                      
                <value>/usr/local/eddy/hadoop-2.4.1/tmp</value>                                  
        </property>                                                                              
<!-- 指定zookeeper地址                                                                           
        <property>                                                                               
                <name>ha.zookeeper.quorum</name>                                                 
                <value>eddy01:2181,eddy02:2181,eddy03:2181</value>                               
        </property>                                                                              
-->    
</configuration>

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>                                                           
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>                                      
<!--                                                                                             
  Licensed under the Apache License, Version 2.0 (the "License");                                
  you may not use this file except in compliance with the License.                               
  You may obtain a copy of the License at                                                        
                                                                                                 
    http://www.apache.org/licenses/LICENSE-2.0                                                   
                                                                                                 
  Unless required by applicable law or agreed to in writing, software                            
  distributed under the License is distributed on an "AS IS" BASIS,                              
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.                       
  See the License for the specific language governing permissions and                            
  limitations under the License. See accompanying LICENSE file.                                  
-->                                                                                              
                                                                                                 
<!-- Put site-specific property overrides in this file. -->                                      
                                                                                                 
<configuration>                                                                                  
<!-- 设置hdfs副本数量 -->                                                                        
                        <property>                                                               
                                <name>dfs.replication</name>                                     
                                <value>2</value>                                                 
                        </property>                                                              
<!-- 元数据的保存位置 -->                                                                        
                        <property>                                                               
                                <name>dfs.name.dir</name>                                        
                                <value>/usr/local/eddy/hadoop-2.4.1/tmp/name1/</value>           
                        </property>                                                              
</configuration>                                                                                 
~                

mapred-site.xml

<?xml version="1.0"?>                                                                            
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>                                      
<!--                                                                                             
  Licensed under the Apache License, Version 2.0 (the "License");                                
  you may not use this file except in compliance with the License.                               
  You may obtain a copy of the License at                                                        
                                                                                                 
    http://www.apache.org/licenses/LICENSE-2.0                                                   
                                                                                                 
  Unless required by applicable law or agreed to in writing, software                            
  distributed under the License is distributed on an "AS IS" BASIS,                              
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.                       
  See the License for the specific language governing permissions and                            
  limitations under the License. See accompanying LICENSE file.                                  
-->                                                                                              
                                                                                                 
<!-- Put site-specific property overrides in this file. -->                                      
                                                                                                 
<configuration>                                                                                  
<!-- 通知框架MR使用YARN -->                                                                      
                        <property>                                                               
                                        <name>mapreduce.framework.name</name>                    
                                        <value>yarn</value>                                      
                        </property>                                                              
</configuration>   

yarn-site.xml

<?xml version="1.0"?>                                                                            
<!--                                                                                             
  Licensed under the Apache License, Version 2.0 (the "License");                                
  you may not use this file except in compliance with the License.                               
  You may obtain a copy of the License at                                                        
                                                                                                 
    http://www.apache.org/licenses/LICENSE-2.0                                                   
                                                                                                 
  Unless required by applicable law or agreed to in writing, software                            
  distributed under the License is distributed on an "AS IS" BASIS,                              
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.                       
  See the License for the specific language governing permissions and                            
  limitations under the License. See accompanying LICENSE file.                                  
-->                                                                                              
<configuration>                                                                                  
                                                                                                 
<!-- Site specific YARN configuration properties -->                                             
<!-- 指定YARN的老大(RM)的地址 -->                                                              
                        <property>                                                               
                <name>yarn.resourcemanager.hostname</name>                                       
                <value>eddy01</value>                                                            
                        </property>                                                              
                                                                                                 
                        <!-- reducer取数据的方式是mapreduce_shuffle -->                          
                        <property>                                                               
                                <name>yarn.nodemanager.aux-services</name>                       
                                <value>mapreduce_shuffle</value>                                 
                        </property>                                                              
</configuration>     

slaves文件,关于slaves文件的详细解析查看:http://www.tuicool.com/articles/zINvYbf

eddy01
eddy02
eddy03

在eddy01上操作:

启动hdfs:

1,格式化hdfs

hdfs namenode -format (hadoop namenode -format)

2,启动hdfs

start-dfs.sh

成功的话用jps命令就会有以下进程:

[root@eddy01 sbin]# jps
6110 Jps
5576 NameNode
5839 SecondaryNameNode

3,启动yarn

start-yarn.sh

进程:

[root@eddy01 sbin]# jps
6654 Jps
6467 NodeManager
6187 ResourceManager
5576 NameNode
5839 SecondaryNameNode

分别在eddy02,eddy03上操作:

启动datanode和nodemanager

hadoop-daemon.sh start datanode
start-yarn.sh
[root@eddy02 eddy]# jps
3216 DataNode
7011 NodeManager
7240 Jps

二,安装hive

1,下载hive,上传到linux中,解压安装包到/usr/local/eddy/hive/中

tar -zxvf hive-0.9.0.tar.gz -C /cloud/

2,安装mysql,可以参看:

http://www.cnblogs.com/Eddyer/p/4993990.html

解决mysql乱码问题:

http://www.cnblogs.com/Eddyer/p/4995056.html

安装完成后,修改root的密码

[root@eddy01 etc]# mysqladmin -u root password "root";  

设置root用户的权限:

安装hive和mysq完成后,将mysql的连接jar包拷贝到$HIVE_HOME/lib目录下
    如果出现没有权限的问题,在mysql授权(在安装mysql的机器上执行)
    mysql -uroot -p
    #(执行下面的语句  *.*:所有库下的所有表   %:任何IP地址或主机都可以连接)
    GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'root' WITH GRANT OPTION;
    FLUSH PRIVILEGES;

在使用的时候出现了问题:

create table years (year string, event string) row format delimited fields terminated by '	';
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)

解决:

1,在hdfs中添加权限:

hadoop dfs -chmod -R 777 /tmp
hadoop dfs -chmod -R 777 /user/hive/warehouse

2,在mysql中手动创建hive数据库

create hive ;

修改编码

mysql> alter database hive character set latin1;
Query OK, 1 row affected (0.00 sec)

mysql> exit

再来启动Hadoop 和hive

3,配置hive文件:

4.配置hive
    (a)配置HIVE_HOME环境变量  vi conf/hive-env.sh 配置其中的$hadoop_home

    
    (b)配置元数据库信息   vi  hive-site.xml 
    添加如下内容:
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
<description>password to use against metastore database</description>
</property>
</configuration>

4,jar包的冲突:

Jline包版本不一致的问题,需要拷贝hive的lib目录中jline.2.12.jar的jar包替换掉hadoop中的 
/home/hadoop/app/hadoop-2.6.4/share/hadoop/yarn/lib/jline-0.9.94.jar

启动hive

[root@eddy01 bin]# ./hive

Logging initialized using configuration in jar:file:/usr/local/eddy/hive-1.2.1/lib/hive-common-1.2.1.jar!/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/eddy/hadoop-2.4.1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/Static
LoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/eddy/hadoop-2.4.1/share/hadoop/mapreduce/hadoop.jar!/org/slf4j/impl/StaticLoggerBinder.c
lass]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
hive> 

附笔记

Hive只在一个节点上安装即可

1.上传tar包

2.解压
    tar -zxvf hive-0.9.0.tar.gz -C /cloud/
3.安装mysql数据库(切换到root用户)(装在哪里没有限制,只有能联通hadoop集群的节点)
    mysql安装仅供参考,不同版本mysql有各自的安装流程
        rpm -qa | grep mysql
        rpm -e mysql-libs-5.1.66-2.el6_3.i686 --nodeps
        rpm -ivh MySQL-server-5.1.73-1.glibc23.i386.rpm 
        rpm -ivh MySQL-client-5.1.73-1.glibc23.i386.rpm 
    修改mysql的密码
    /usr/bin/mysql_secure_installation
    (注意:删除匿名用户,允许用户远程连接)
    登陆mysql
    mysql -u root -p

4.配置hive
    (a)配置HIVE_HOME环境变量  vi conf/hive-env.sh 配置其中的$hadoop_home

    
    (b)配置元数据库信息   vi  hive-site.xml 
    添加如下内容:
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
<description>password to use against metastore database</description>
</property>
</configuration>
    
5.安装hive和mysq完成后,将mysql的连接jar包拷贝到$HIVE_HOME/lib目录下
    如果出现没有权限的问题,在mysql授权(在安装mysql的机器上执行)
    mysql -uroot -p
    #(执行下面的语句  *.*:所有库下的所有表   %:任何IP地址或主机都可以连接)
    GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'root' WITH GRANT OPTION;
    FLUSH PRIVILEGES;

6. Jline包版本不一致的问题,需要拷贝hive的lib目录中jline.2.12.jar的jar包替换掉hadoop中的 
/home/hadoop/app/hadoop-2.6.4/share/hadoop/yarn/lib/jline-0.9.94.jar


启动hive
bin/hive

----------------------------------------------------------------------------------------------------
    
6.建表(默认是内部表)
    create table trade_detail(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '	';
    建分区表
    create table td_part(id bigint, account string, income double, expenses double, time string) partitioned by (logdate string) row format delimited fields terminated by '	';
    建外部表
    create external table td_ext(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '	' location '/td_ext';

7.创建分区表
    普通表和分区表区别:有大量数据增加的需要建分区表
    create table book (id bigint, name string) partitioned by (pubdate string) row format delimited fields terminated by '	'; 

    分区表加载数据
    load data local inpath './book.txt' overwrite into table book partition (pubdate='2010-08-22');
    
    load data local inpath '/root/data.am' into table beauty partition (nation="USA");

    
    select nation, avg(size) from beauties group by nation order by avg(size);

---恢复内容结束---

原文地址:https://www.cnblogs.com/Eddyer/p/6522388.html