hadoop2.5.1+hbase1.1.2安装与配置

今天正在了解HBase和Hadoop,了解到HBase1.1.x为稳定版,对应的Hadoop2.5.x是最新的支持此版本HBase的,同时jdk版本为jdk7才能支持。--本段话内容参考自Apache官方文档

1.本表格为jdk与hbase版本对应:

 

2.以下表格为hbase与Hadoop版本对应。
Hadoop version support matrix
  • "S" = supported

  • "X" = not supported

  • "NT" = Not tested

官方强烈建议安装Hadoop2.x:

 
Hadoop 2.x is recommended.

Hadoop 2.x is faster and includes features, such as short-circuit reads, which will help improve your HBase random read profile. Hadoop 2.x also includes important bug fixes that will improve your overall HBase experience. HBase 0.98 drops support for Hadoop 1.0, deprecates use of Hadoop 1.1+, and HBase 1.0 will not support Hadoop 1.x.

本想把环境搭建起来,可是找不到机器,我找了一篇文章专门搭建和配置此环境的,先拿来贴在下面,等有机会自己搭一套。

 

以下详细安装配置的指导内容转自:http://blog.csdn.net/yuansen1999/article/details/50542018

===================================以下全文:

hadoop2.5.1+hbase1.1.2安装与配置

版权声明:本文为博主原创文章,未经博主允许不得转载。

【说明】

 hbase自1.0版本发布之后,标志着hbase可以投入企业的生产使用。此后又发布了1.x版本, 这里的1.1.2版本就是其中的一个稳定版本。

因为hbase对Hadoop的库有依赖关系,对于hbase1.1.2要求hadoop的库为2.5.1,所以使用hadoop2.5.1版本做为基本环境。如果使用其它

的hadoop版本,  还需要它lib下的jar文件替换成hadoop的版本,不然就会报本地库找不到的错误, 下面是实际的安装步骤。 

 

1、     软件安装版本

组件名

版本

备注

操作系统

CentOS release 6.4 (Final)

          64位

JDK

jdk-7u80-linux-x64.gz

Hadoop

hadoop-2.5. 1.tar.gz

ZooKeeper

zookeeper-3.4.6.tar.gz

HBase

hbase-1.1.2.tar.gz

 

 

2、     主机规划

IP

HOST

模块部署

192.168.8.127

master

QuorumPeerMain

DataNode

ResourceManager

HRegionServer

NodeManager

SecondaryNameNode

NameNode

HMaster

192.168.8.128

slave01

DataNode

QuorumPeerMain

HRegionServer

NodeManager

192.168.8.129

slave02

QuorumPeerMain

HRegionServer

NodeManager

DataNode

 

3、     目录规划

IP

目录

192.168.8.127

三个挂载点

根目录:  /dev/sda1      /

swap目录: tmpfs          /dev/shm

hadoop目录: /dev/sda3    /hadoop

192.168.8.128

三个挂载点

根目录:  /dev/sda1      /

swap目录: tmpfs          /dev/shm

hadoop目录: /dev/sda3    /hadoop

192.168.8.129

三个挂载点

根目录:  /dev/sda1      /

swap目录: tmpfs          /dev/shm

hadoop目录: /dev/sda3    /hadoop

 

         [root@master~]# df -h

 

 

4、     为每台主机创建用户hadoop并属于hadoop组

3.1、创建工作组hadoop:

[root@localhost ~]# groupadd hadoop

3.2、新建用户hadoop并添加至hadoop组别:

[root@localhost ~]# useradd hadoop -g hadoop

3.3、设置hadoop用户密码为hadoop:

[root@localhost ~]# passwd hadoop

5、     修改并配置主机名

[root@localhost ~]# vi /etc/hosts

127.0.0.1     localhost

192.168.8.127 master

192.168.8.128 slave01

192.168.8.129slave02

[root@localhost ~]# vi /etc/sysconfig/network

 

关机重启:

[root@localhost ~]# reboot

查看主机名:

 

 

修改hadoop目录的拥有者:

[root@master ~]# chown hadoop:hadoop -R /hadoop

[root@master ~]# ls -l /

 

6、     上传安装软件包至hadoop用户主目录

 

 

7、     安装JDK

6.1 安装JDK

[root@master ~]# cd /usr/local/

[hadoop@master local]$ tar -zxvf jdk-7u80-linux-x64.gz

6.2 配置JDK环境变量

export JAVA_HOME= /usr/local/jdk1.7.0_80

export JRE_HOME=  /usr/local/jdk1.7.0_80/jre

export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH

export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

6.3使环境变量生效

[hadoop@master~]$ source .bashrc

6.4检测JDK是否安装成功:

 

8、     配置各节点间SSH安全通信协议:

7.1、创建文件目录:

[hadoop@master ~]$ mkdir .ssh

7.2、进入.ssh目录进行相应配置:

[hadoop@master ~]$ cd .ssh/

7.3、生成公钥文件:

[hadoop@master .ssh]$ ssh-keygen -t rsa

备注:一路回车即可

7.4、将生成的公钥文件添加至认证文件:

[hadoop@master .ssh]$ cat id_rsa.pub >>authorized_keys

7.5、赋予.ssh文件700权限:

[hadoop@master .ssh]$ chmod 700 .ssh/

这个有的机器必须,但有的是可选。

7.5、赋予认证文件600权限:

[hadoop@master .ssh]$ chmod 600 authorized_keys

一定是600,不然不会成功。

7.6、测试SSH无密码登录:

[hadoop@master hadoop]$ ssh master

Last login: Tue Jan 19 13:58:27 2016 from 192.168.8.1

7.7、依次生成其他节点的SSH无密码登录(一样套路)

7.8、将master节点节点的公钥文件追加至其他节点(以master追加至slave01为例进行)

7.8.1、将master中的公钥id_rsa.pub远程拷贝至slave01节点的.ssh目录下并重新命名为:master.pub

[hadoop@master .ssh]$ scp id_rsa.pub slave01:/home/hadoop/.ssh/master.pub

这个步骤,注意不要把人家的id_rsa.pub给覆盖了。

7.8.2、切换至slave01节点,将master.pub追加至认证文件authorized_keys文件中

[hadoop@slave01 .ssh]$ cat master_rsa.pub >>authorized_keys

7.8.3、slave02与以上步骤相同

备注:第一次登录时需要进行密码输入

9、     安装Hadoop:

8.1、解压安装包:

[hadoop@master ~]$cd /hadoop

[hadoop@master hadoop]$tar -zxvf hadoop-2.5.1.tar.gz

8.2、配置Hadoop环境变量:

[hadoop@master ~]$vi .bashrc

export HADOOP_HOME=/hadoop/hadoop-2.5.1

export HADOOP_CONF_DIR=/hadoop/hadoop-2.5.1/etc/hadoop

exportPATH=.:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

8.3、使环境变量生效:

[hadoop@master ~]$source .bashrc

8.4、进入hadoop配置目录按照以下表格进行配置:

备注:现将附件中的fairscheduler.xml文件copy至/hadoop/hadoop-2.5.1/

etc/hadoop中

 

[hadoop@master hadoop]$ pwd

/hadoop/hadoop-2.5.1/etc/hadoop

core-site.xml

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://master:8020</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/home/hadoop/tmp</value>

</property>

<property>

<name>hadoop.proxyuser.root.groups</name>

<value>*</value>

</property>

<property>

<name>hadoop.proxyuser.root.hosts</name>

<value>*</value>

</property>

<property>

<name>hadoop.proxyuser.yarn.hosts</name>

<value>*</value>

</property>

<property>

<name>hadoop.proxyuser.yarn.groups</name>

<value>*</value>

</property>

</configuration>

hdfs-site.xml

<configuration>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>file:/home/hadoop/hadoop/dfs/name</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>file:/home/hadoop/hadoop/dfs/data</value>

</property>

</configuration>

mapred-site.xml

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

<property>

<name>mapreduce.jobhistory.address</name>

<value>master:10020</value>

</property>

<property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>master:19888</value>

</property>

<property>

<name>mapred.child.Java.opts</name>

<value>-Xmx4096m</value>

</property>

</configuration>

yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->

<property>

<description>The hostname of the RM.</description>

<name>yarn.resourcemanager.hostname</name>

<value>master</value>

</property>

<property>

<description>The address of the applications manager interface in the RM.</description>

<name>yarn.resourcemanager.address</name>

<value>${yarn.resourcemanager.hostname}:8032</value>

</property>

<property>

<description>The address of the scheduler interface.</description>

<name>yarn.resourcemanager.scheduler.address</name>

<value>${yarn.resourcemanager.hostname}:8030</value>

</property>

<property>

<description>The http address of the RM web application.</description>

<name>yarn.resourcemanager.webapp.address</name>

<value>${yarn.resourcemanager.hostname}:8088</value>

</property>

<property>

<description>The https adddress of the RM web application.</description>

<name>yarn.resourcemanager.webapp.https.address</name>

<value>${yarn.resourcemanager.hostname}:8090</value>

</property>

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>${yarn.resourcemanager.hostname}:8031</value>

</property>

<property>

<description>The address of the RM admin interface.</description>

<name>yarn.resourcemanager.admin.address</name>

<value>${yarn.resourcemanager.hostname}:8033</value>

</property>

<property>

<description>The class to use as the resource scheduler.</description>

<name>yarn.resourcemanager.scheduler.class</name>

<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>

</property>

<property>

<description>fair-scheduler conf location</description>

<name>yarn.scheduler.fair.allocation.file</name>

<value>${yarn.home.dir}/etc/hadoop/fairscheduler.xml</value>

</property>

<property>

<description>List of directories to store localized files in. An application's localized file directory will be found in:

${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.Individual containers' work directories, calledcontainer_${contid}, will be subdirectories of this.

</description>

<name>yarn.nodemanager.local-dirs</name>

<value>/home/hadoop/hadoop/local</value>

</property>

<property>

<description>Whether to enable log aggregation</description>

<name>yarn.log-aggregation-enable</name>

<value>true</value>

</property>

<property>

<description>Where to aggregate logs to.</description>

<name>yarn.nodemanager.remote-app-log-dir</name>

<value>/tmp/logs</value>

</property>

<property>

<description>Amount of physical memory, in MB, that can be allocated for containers.</description>

<name>yarn.nodemanager.resource.memory-mb</name>

<value>30720</value>

</property>

<property>

<description>Number of CPU cores that can be allocated for containers.</description>

<name>yarn.nodemanager.resource.cpu-vcores</name>

<value>8</value>

</property>

<property>

<description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers</description>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

slaves

master

slave01

slave02

hadoop-env.sh

export JAVA_HOME=/hadoop/jdk1.7.0_80

备注:最后一行进行添加

8.5、配置各叶子节点的环境:

8.5.1、在master端将hadoop-2.5.1、jdk1.7.0_80、环境变量文件.bashrc文件远程拷贝至其他节点

8.5.2、在slave01、slave02节点执行使环境变量生效的命令:

[hadoop@slave01 ~]$ source.bashrc

8.6、进行格式化:

[hadoop@master hadoop]$hadoop namenode –format

8.7、启动Hadoop:

[hadoop@master hadoop]$ start-all.sh

[hadoop@master hadoop]$ mr-jobhistory-daemon.shstart historyserver

8.8、查看启动进程:

8.8.1、master节点进程:

[hadoop@master hadoop]$ jps

3456 Jps

2305 NameNode

3418 JobHistoryServer

2592 SecondaryNameNode

2844 NodeManager

2408 DataNode

2739 ResourceManager

8.8.2、slave01、slave02节点进程:

[hadoop@slave01~]$ jps

2567Jps

2249DataNode

2317NodeManager

 

[hadoop@slave02~]$ jps

2298NodeManager

2560Jps

2229DataNode

8.9、在各个节点关闭防火墙:

[root@master ~]# iptables -F

[root@master ~]# service iptables save

 

[root@master ~]# service iptables stop

[root@master ~]# chkconfig iptablesoff

 

有ip6tables的,也一样

[root@master ~]# ip6tables -F

[root@master ~]# service ip6tables save

 

[root@master ~]# service ip6tablesstop

[root@master ~]# chkconfig ip6tablesoff

8.10、访问Web页面:

http://master:8088/cluster/cluster

 

 

10、           安装ZooKeeper:

10.1、master端安装:

10.1.1、解压安装包:

[hadoop@master ~]$cd /hadoop

[hadoop@master hadoop]$tar -zxvf zookeeper-3.4.6.tar.gz

10.1.2、配置环境变量:

[hadoop@master ~]$vi .bashrc

export ZOOKEEPER_HOME=/hadoop/zookeeper-3.4.6

exportPATH=.:$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/conf:$PATH

10.1.3、使环境变量生效:

[hadoop@master ~]$source .bashrc

10.1.4、切换至ZooKeeper的配置文件目录进行配置:

[hadoop@master ~]$ cd /hadoop/zookeeper-3.4.6/conf/

10.1.5、新建Zookeeper配置文件:

[hadoop@master conf]$ cpzoo_sample.cfg zoo.cfg

10.1.6、对zoo.cfg进行配置:

内容

备注

dataDir=/hadoop/zookeeperdata

1、  此为修改项

2、  hadoop为用户名

clientPort=2181

1、此为修改项

server.1=master:2888:3888

server.2=slave01:2888:3888

server.3= slave02:2888:3888

1、此为新增项

10.1.7、在主目录下进行一下操作:

[hadoop@master ~]$ cd /hadoop

[hadoop@master hadoop]$ mkdirzookeeperdata

[hadoop@master hadoop]$ echo"1" > /hadoop/zookeeperdata/myid

10.2、salve01端安装:

10.2.1、将hadoop中zookeeper-3.4.6进行远程复制到salve01的主目录:

[hadoop@master hadoop]$ scp -r zookeeper-3.4.6slave01:/hadoop

10.2.2、将master中.bashrc文件远程拷贝至datanode1中:

[hadoop@master ~]$ cd

[hadoop@master ~]$ scp.bashrc slave01:/home/hadoop

10.2.3、在slave01中使环境变量生效:

[hadoop@salve01~]$ source .bashrc

10.2.4、在slave01中进行如下操作:

[hadoop@slave01 ~]$ cd /hadoop

[hadoop@slave01 hadoop]$ mkdir zookeeperdata

[hadoop@slave01 hadoop]$ echo"2" > /home/hadoop/zookeeperdata/myid

10.3、slave02端的安装(忽略):

 

[hadoop@salve02~]$ source .bashrc

[hadoop@slave02 ~]$ cd /hadoop

[hadoop@slave02 hadoop]$ mkdir zookeeperdata

[hadoop@slave02 hadoop]$ echo"3" > /hadoop/zookeeperdata/myid

 

10.4、启动所有zookeeper服务:

[hadoop@master hadoop]$ zkServer.shstart

JMX enabled by default

Using config:/hadoop/zookeeper-3.4.6/bin/../conf/zoo.cfg

Starting zookeeper ... STARTED

 

[hadoop@slave01 ~]$ zkServer.sh start

JMX enabled by default

Using config:/hadoop/zookeeper-3.4.6/bin/../conf/zoo.cfg

Starting zookeeper ... STARTED

 

[hadoop@slave02 hadoop]$ zkServer.shstart

JMX enabled by default

Using config:/hadoop/zookeeper-3.4.6/bin/../conf/zoo.cfg

Starting zookeeper ... STARTED

10.5、使用JPS查看进程:

[hadoop@master hadoop]$ jps

2305 NameNode

3608 Jps

3418 JobHistoryServer

2592 SecondaryNameNode

2844 NodeManager

2408 DataNode

2739 ResourceManager

3577 QuorumPeerMain

 

其中“QuorumPeerMain” 就是我们的zookeeper进程。

 

[hadoop@slave01 ~]$ jps

2249 DataNode

2662 Jps

2317 NodeManager

2616 QuorumPeerMain

 

[hadoop@slave02 hadoop]$ jps

2599 QuorumPeerMain

2298 NodeManager

2652 Jps

2229 DataNode

 

11、安装HBASE:

11.1、配置NTP时间同步服务:

11.1.1、服务端(Master)配置:

[hadoop@masterhadoop]$ su - root

密码:

 

[root@master ~]# vi/etc/ntp.conf

修改以下配置:

#restrictdefault kod nomodify notrap nopeer noquery

restrictdefault kod nomodify

restrict-6 default kod nomodify notrap nopeer noquery

 

修改完成之后,启动ntpd.

[root@master ~]service ntpd start

[root@master ~]chkconfig ntpd on

11.1.2、客户端配置:

[hadoop@slave01 ~]$su - root

密码:

[root@slave01~]# crontab -e

输入 以下命令:

0-59/10 * * * */usr/sbin/ntpdate 192.168.8.127 && /sbin/hwclock -w

 

我们每隔10分钟与主机对一下时间。

 

11.2安装HBASE

11.2.1、解压缩hbase安装包

[hadoop@master ~]$ cd /hadoop

[hadoop@master hadoop]$ tar -zxvf hbase-1.1.2-bin.tar.gz

11.2.2、配置环境变量:

[hadoop@master hadoop]$ vi ~/.bashrc

增加hbase的目录:

export HBASE_HOME=/hadoop/hbase-1.1.2

exportPATH=.:$HBASE_HOME/bin:$HBASE_HOME/conf:$PATH

11.2.3、使环境变量生效:

[hadoop@master hadoop]$ source ~/.bashrc

11.2.4、切换至HBase的配置目录:

[hadoop@master hadoop]$ cd /hadoop/hbase-1.1.2/conf

11.2.5、配置hbase-env.sh文件:

[hadoop@masterconf]$ vi hbase-env.sh

内容

备注

export HBASE_MANAGES_ZK=false

1、此为修改项;

11.2.6、配置hbase-site.xml文件:

[hadoop@master conf]$ vihbase-site.xml

<configuration>

<property>

<name>hbase.rootdir</name>

<value>hdfs://master:8020/hbase</value>

</property>

<property>

<name>hbase.cluster.distributed</name>

<value>true</value>

</property>

<property>

<name>hbase.master</name>

<value>master</value>

</property>

<property>

<name>hbase.zookeeper.property.clientPort</name>

<value>2181</value>

</property>

<property>

<name>hbase.zookeeper.quorum</name>

<value>master,slave01,slave02</value>

</property>

</configuration>

11.2.7、配置regionservers文件:

[hadoop@master conf]$ vi regionservers

master

slave01

slave02

11.2.8、slave01与slave02配置:

同master配置

11.2.9、启动Hbase(确保HADOOP和ZOOKEEPER已经启动)

[hadoop@master conf]$ start-hbase.sh

11.2.10、使用JPS查看进程:

[hadoop@master hadoop]$ jps

2305 NameNode

3418 JobHistoryServer

2592 SecondaryNameNode

2844 NodeManager

2408 DataNode

2739 ResourceManager

3577 QuorumPeerMain

3840 HMaster

4201 Jps

3976 HRegionServer

 

11.2.11、进入HBASE命令行模式并进行相应查询:

[hadoop@master hadoop]$ hbase shell

HBase Shell; enter'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" toleave the HBase Shell

Version 1.1.2,rcc2b70cf03e3378800661ec5cab11eb43fafe0fc, Wed Aug 26 20:11:27 PDT         2015

 

hbase(main):005:0> list

TABLE                                                                                  

0 row(s) in 0.0270 seconds

 

=> []

 

我们创建一个表,看看是否成功:

hbase(main):006:0> create'test','info'

0 row(s) in 2.3150 seconds

 

=> Hbase::Table - test

hbase(main):007:0>

看来是成功了,添加一条数据,看看是否能够保存。

hbase(main):008:0> put'test','u00001','info:username','yuansen'

0 row(s) in 0.1400 seconds

hbase(main):009:0> scan 'test'

ROW             COLUMN+CELL                                                    

 u00001           column=info:username,timestamp=1453186521452, value=yuansen   

1 row(s) in 0.0550 seconds

 

看来的确是成功了。

原文地址:https://www.cnblogs.com/huanlegu0426/p/hbase03.html