Hadoop HA架构搭建

Hadoop HA架构搭建

共七台服务器,节点角色分配如下:

192.168.133.21 (BFLN-01):namenode  zookeeper  journalnade DFSZKFailoverController
192.168.133.23 (BFLN-02):namenode resourcemanager zookeeper  journalnade DFSZKFailoverController
192.168.133.24 (BFLN-03):resourcemanager zookeeper  journalnade DFSZKFailoverController
192.168.133.25 (BFLN-04):datanode,nodemanager
192.168.133.26 (BFLN-05):datanode,nodemanager
192.168.133.27 (BFLN-06):datanode,nodemanager
192.168.133.28 (BFLN-07):datanode,nodemanager

HA优势:双namedata和resourcemanager能防止hadoop核心组件单点故障导致集群不可用情况的发生。

配置步骤:

环境配置

1、集群间需实现时间同步:

 ntpdate

2、配置7台服务器的主机名解析/etc/hosts(每台都要配置):

192.168.133.21  BFLN-01
192.168.133.23  BFLN-02
192.168.133.24  BFLN-03
192.168.133.25  BFLN-04
192.168.133.26  BFLN-05
192.168.133.27  BFLN-06
192.168.133.28  BFLN-07

3、配置ssh服务/etc/ssh/sshd.conf

StrictHostKeyChecking no

UserKnownHostsFile /dev/null

不然启动hdfs服务的时候可能会异常:

Starting namenodes on [BFLN-01 BFLN-02]
The authenticity of host 'BFLN-02 (192.168.133.23)' can't be established.
ECDSA key fingerprint is 79:d1:ec:82:d3:1c:50:8a:17:c2:2d:f0:87:20:53:44.
Are you sure you want to continue connecting (yes/no)? The authenticity of host 'BFLN-01 (192.168.133.21)' can't be established.
ECDSA key fingerprint is 30:75:04:10:93:d2:57:d7:3d:b1:cc:31:92:30:1a:a1.
Are you sure you want to continue connecting (yes/no)? yes

4、每台服务器实现ssh无密钥认证,包括本机与本机的免密钥认证:

ssh-keygren :生成一对密钥

ssh-copy-id : 把公钥发给对方服务器

5、配置安装JAVA环境并配置JAVA和hadoop环境变量:

export JAVA_HOME=/usr/java/jdk1.8.0_51/

export HADOOP_HOME=/opt/hadoop-spark/hadoop/hadoop-2.9.1

PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

安装zookeeper集群:

7、解压zookeeper压缩包。

8、修改zookeeper配置文件:

# The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial

# synchronization phase can take

initLimit=10

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

syncLimit=5

# the directory where the snapshot is stored.

# do not use /tmp for storage, /tmp here is just

# example sakes.

dataDir=/data/zookeeper

# the port at which the clients will connect

clientPort=2181

server.1=192.168.133.21:2888:3888

server.2=192.168.133.23:2888:3888

server.3=192.168.133.24:2888:3888

# the maximum number of client connections.

# increase this if you need to handle more clients

#maxClientCnxns=60

#

# Be sure to read the maintenance section of the

# administrator guide before turning on autopurge.

#

# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance

#

# The number of snapshots to retain in dataDir

#autopurge.snapRetainCount=3

# Purge task interval in hours

# Set to "0" to disable auto purge feature

9、zookeeper数据路径文件下添加个代表zookeeper节点id的myid文件(本配置文件的数据路径为/data/zookeeper,节点id分别为1,2,3)

10、启动zookeeper集群

./zkServer.sh start

安装配置Hadoop-HA:

11、下载hadoop-spark压缩包,解压,尽量保持7台服务器的hadoop安装路径是一致的。

在192.168.133.21上配置:

cd $HADOOP_HOME/etc/hadoop/

vi core-site.xml

<configuration>

    <property>

        <name>fs.defaultFS</name>

        <value>hdfs://BFLN</value>   <!--#BFLN为nodename集群的代理名字,此名字要和hdfs-site.xml配置的dfs.nameservices集群名字一致-->

    </property>

    <property>

        <name>hadoop.tmp.dir</name>

        <value>/data/hadoop-spark/hadoop/tmp</value>    <!--#指定hdfs目录-->

    </property>

 <property>

      <name>ha.zookeeper.quorum</name>

      <value>BFLN-01:2181,BFLN-02:2181,BFLN-03:2181</value>  <!--配置zookeeper集群的地址-->

 </property>

</configuration>

 

vi hdfs-site.xml

<configuration>

    <!-- #BFLN为nodename集群的代理名字,此名字要和core-site.xml配置的fs.defaultFS集群名字一致 -->

    <property>

        <name>dfs.nameservices</name>

        <value>BFLN</value>

    </property>

    

    <!-- BFLN集群下有两个namenode节点,分别为BFLN1,BFLN2 -->

    <property>

       <name>dfs.ha.namenodes.BFLN</name>

       <value>BFLN1,BFLN2</value>

    </property>

    

    <!-- 配置namenode第一节点的rpc通信端口 -->

    <property>

       <name>dfs.namenode.rpc-address.BFLN.BFLN1</name>

       <value>BFLN-01:9000</value>

    </property>

    

    <!-- 配置namenode第一节点的http通信端口 -->

    <property>

        <name>dfs.namenode.http-address.BFLN.BFLN1</name>

        <value>BFLN-01:50070</value>

    </property>

    

    <!-- 配置namenode第二节点的rpc通信端口 -->

    <property>

        <name>dfs.namenode.rpc-address.BFLN.BFLN2</name>

        <value>BFLN-02:9000</value>

    </property>

    

    <!-- 配置namenode第二节点的http通信端口 -->

    <property>

        <name>dfs.namenode.http-address.BFLN.BFLN2</name>

        <value>BFLN-02:50070</value>

    </property>

    

    <!-- 配置journalnade互连的地址及端口,官网建议journalnade节点数为奇数 -->

    <property>

        <name>dfs.namenode.shared.edits.dir</name>

        <value>qjournal://BFLN-01:8485;BFLN-02:8485/BFLN</value>

    </property>

    

    <!-- 指定JournalNode在本地磁盘存放数据的位置 -->

    <property>

          <name>dfs.journalnode.edits.dir</name>

          <value>/data/hadoop-spark/hadoop/tmp/jn</value>

    </property>

    

    <!-- 开启NameNode故障时自动切换 -->

    <property>

          <name>dfs.ha.automatic-failover.enabled</name>

          <value>true</value>

    </property>

    

    <!--配置失败自动切换实现方式-->

    <property>

            <name>dfs.client.failover.proxy.provider.BFLN</name>

            <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

    </property>

    

    <!--配置当namenode出现脑裂时,hdfs对其处理的方式,sshfenc会自动通过ssh到old-active将其杀掉,将standby切换为active-->

    <property>

             <name>dfs.ha.fencing.methods</name>

             <value>sshfence</value>

    </property>

 

    <!--配置HA namenode通信公钥的地址-->

    <property>

            <name>dfs.ha.fencing.ssh.private-key-files</name>

            <value>/root/.ssh/id_rsa</value>

    </property>

 

    <!--配置启动集群代理,如果此选项没有配置,后期启动的时候hadoop会把集群名称BFLN当成主机名与之通信,导致报错-->

    <property> 

        <name>dfs.client.failover.proxy.provider.BFLN</name>

        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

    </property>

 

    <!--配置副本数-->

    <property>

        <name>dfs.replication</name>

        <value>3</value>

    </property> 

 

    <!--配置是否检查权限-->

    <property>

        <name>dfs.permissions</name>

        <value>false</value>

    </property>

</configuration>


vi yarn-site.conf

<configuration>

  <!-- 开启resourcemanager HA服务,默认是false -->

  <property>

    <name>yarn.resourcemanager.ha.enabled</name>

    <value>true</value>

  </property>

  <!-- 开启RM重启的功能,作用:当yarn中有任务在跑时,如果rm宕机,设置成ture,rm重启时会恢复原来没有跑完的application -->

  <property>

    <name>yarn.resourcemanager.recovery.enabled</name>

    <value>true</value>

  </property>

 

  <!--  配置RM集群ID  -->

  <property>

    <name>yarn.resourcemanager.cluster-id</name>

    <value>BFLN-yarn</value>

  </property>

 

 

  <!--RM集群下的两个RM节点名称  -->

  <property>

    <name>yarn.resourcemanager.ha.rm-ids</name>

    <value>BFLN-yarn1,BFLN-yarn2</value>

  </property>

 

 

  <!--  BFLN-yarn1节点的地址  -->

  <property>

    <name>yarn.resourcemanager.hostname.BFLN-yarn1</name>

    <value>BFLN-02</value>

  </property>

 

 

  <!--  BFLN-yarn2节点的地址  -->

  <property>

    <name>yarn.resourcemanager.hostname.BFLN-yarn2</name>

    <value>BFLN-03</value>

  </property>

 

 

  <!-- zookeeper集群的地址  -->

  <property>

    <name>yarn.resourcemanager.zk-address</name>

    <value>BFLN-01:2181,BFLN-02:2181,BFLN-03:2181</value>

  </property>

 

 

  <!-- 用于状态存储的类,默认是基于Hadoop 文件系统的实现(FileSystemStateStore)  -->

  <property>

    <name>yarn.resourcemanager.store.class</name>

    <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>

  </property>

 

  <!-- NodeManager上运行的附属服务。需配置成mapreduce_shuffle,才可运行MapReduce程序  -->

  <property>

    <name>yarn.nodemanager.aux-services</name>

    <value>mapreduce_shuffle</value>

    </property>

</configuration>


vi slaves:

(配置datanode节点)

192.168.133.25

192.168.133.26

192.168.133.27

192.168.133.28

 

vi hadoop-env.sh:

export JAVA_HOME=/usr/java/jdk1.8.0_51/ 

 

12:至此,所有的配置以及配置完成,需要将这几个文件复制发送给其他服务器。

启动HDFS服务:

注意:启动顺序很重要,顺序错了会导致后期频繁报错!

1、在BFLN-01上启动journalnode,命令:./sbin/hadoop-daemon.sh start journalnode # 启动 journalnode

2、在BFLN-01上格式化namenode,命令:./bin/hdfs namenode -format  # 格式化namemode路径

3、在BFLN-01上注册zookeeper,命令:./bin/hdfs zkfc -formatZK    # 向zookeeper集群注册hdfs

4、在BFLN-01上启动namenode,命令:./sbin/start-dfs.sh   # 启动hdfs服务,注意,此时只会启动BFLN-01上的namenode

5、在BFLN-02上同步namenode,命令:./bin/hdfs namenode -bootstrapStandby  # BFLN-02节点的namenode从BFLN-01上的namenode同步元数据。

6、在BFLN-02上启动namenode,命令:./sbin/hadoop-daemon.sh start namenode   # 在BFLN-02上启动namenode节点

7、在BFLN-02上启动resourcemanager,命令:./sbin/start-yarn.sh  #启动RM,NM服务

8、在0BFLN-02上启动resourcemanager,命令:./sbin/yarn-daemon.sh start resourcemanager  #启动备用RM服务。

 

测试:kill一个为active的namenode/resourcemanager节点,查看另外一个standby节点是否转化成active节点:

查看namenode节点状态的命令:

./bin/hdfs  haadmin -getServiceState BFLN1

./bin/hdfs  haadmin -getServiceState BFLN2

查看resourcemanager节点状态的命令:

./bin/yarn  rmadmin -getServiceState BFLN-yarn1

./bin/yarn  rmadmin -getServiceState BFLN-yarn2

如果kill active节点后standby节点无法切换成active节点,可能系统需要安装一个软件:

psmisc

 

 

原文地址:https://www.cnblogs.com/hel7512/p/12350634.html