Hadoop 2.7.2 集群安装配置

一、虚拟机环境配置

准备三台虚拟机:

Linux系统

IP

hostname

Centos 7

192.168.107.2

Hadoop01

Centos 7

192.168.107.3

Hadoop02

Centos 7

192.168.107.4

Hadoop03

(1)下载安装VMware

链接:https://pan.baidu.com/s/1a_9pW6-nesgl_GUINFfqsA 提取码:6b6b

(2)在VMware上安装虚拟机

(3)克隆虚拟机 (百度一下)

(4)配置虚拟机

4.1 修改克隆虚拟机的静态IP

4.2 修改主机名

4.3 关闭防火墙

4.4 创建hadoop用户

4.5 配置hadoop用户具有root权限

  修改 /etc/sudoers 文件,找到下面一行,在root下面添加一行

  ## Allow root to run any commands anywhere

  root ALL=(ALL) ALL

  hadoop ALL=(ALL) ALL

(5)在/opt目录下创建module、software文件夹

[hadoop@hadoop01 ~]$ sudo mkdir -p /opt/module

[hadoop@hadoop01 ~]$ sudo mkdir -p /opt/software

(6)修改module、software文件夹的所有者为hadoop

[hadoop@hadoop01 opt]$ sudo chown hadoop:hadoop /opt/module/ /opt/software/ 

(7)配置SSH免密登录

生成公钥和私钥:

[hadoop@hadoop01 ~]$ ssh-keygen -t rsa

然后敲(三个回车),就会生成两个文件id_rsa(私钥)、id_rsa.pub(公钥)

将公钥拷贝到要免密登录的目标机器上

[hadoop@hadoop01 ~]$ ssh-copy-id hadoop01

[hadoop@hadoop01 ~]$ ssh-copy-id hadoop02

验证

[hadoop@hadoop01 ~]$ ssh hadoop02

Last login: Tue Oct 27 19:10:02 2020 from gateway

[hadoop@hadoop02 ~]$

 

(8)编写集群分发脚本xsync

(a)在/home/hadoop目录下创建bin目录,并在bin目录下xsync创建文件,文件内容如下:

[hadoop@hadoop01 ~]$ mkdir bin

[hadoop@hadoop01 ~]$ cd bin/

[hadoop@hadoop01 bin]$ touch xsync

[hadoop@hadoop01 bin]$ vi xsync

在该文件中编写如下代码

#!/bin/bash

#1 获取输入参数个数,如果没有参数,直接退出
pcount=$#
if((pcount==0)); then
echo no args;
exit;
fi

#2 获取文件名称
p1=$1
fname=`basename $p1`
echo fname=$fname

#3 获取上级目录到绝对路径
pdir=`cd -P $(dirname $p1); pwd`
echo pdir=$pdir

#4 获取当前用户名称
user=`whoami`

#5 循环
for((host=2; host<=3; host++)); do
        echo ------------------- hadoop$host --------------
        rsync -rvl $pdir/$fname $user@hadoop0$host:$pdir
done

(b)修改脚本 xsync 具有执行权限

[hadoop@hadoop01 bin]$ chmod 777 xsync

(c)调用脚本形式:xsync 文件名称

[hadoop@hadoop01 bin]$ xsync /home/hadoop/bin

 注意:如果将xsync放到/home/hadoop/bin目录下仍然不能实现全局使用,可以将xsync移动到/usr/local/bin目录下。

 

二、安装JDK

(1)下载JDK

链接:https://pan.baidu.com/s/1wuR2FQe_RYO5mBpAWytGLw 提取码:td3y

(2)用FTP/SFTP工具将下载的JDK压缩包上传至/opt/software/

(3)解压JDK到/opt/module目录下

[hadoop@hadoop01 software]$ tar -zvxf jdk-8u144-linux-x64.tar.gz -C /opt/module/

(4) 配置JDK

   4.1 获取JDK路径

[hadoop@hadoop01 software]$ cd /opt/module/jdk1.8.0_144/

[hadoop@hadoop01 jdk1.8.0_144]$ pwd

     /opt/module/jdk1.8.0_144

        4.2 编辑 /etc/profile文件source /etc/profile

          [hadoop@hadoop01 jdk1.8.0_144]$ sudo vi /etc/profile

            添加:

#JAVA_HOME

export JAVA_HOME=/opt/module/jdk1.8.0_144

export PATH=$PATH:$JAVA_HOME/bin

        4.3 让修改后的 /etc/profile文件生效

[hadoop@hadoop01 jdk1.8.0_144]$ source /etc/profile

4.4 测试JDK安装是否成功

[hadoop@hadoop01 jdk1.8.0_144]$ java -version

java version "1.8.0_144"

        

三、安装Hadoop

(1)下载Hadoop

链接:https://pan.baidu.com/s/1Bb3h5OwAymfBtF0jKGwhXg  提取码:o061

(2)用FTP/SFTP工具将下载的Hadoop压缩包上传至/opt/software/

(3)解压Hadoop压缩包到/opt/module目录下

[hadoop@hadoop01 software]$ tar -zvxf hadoop-2.7.2.tar.gz -C /opt/module/

(4)将Hadoop添加到环境变量

        4.1 获取Hadoop_home路径

[hadoop@hadoop01 ~]$ cd /opt/module/hadoop-2.7.2/

[hadoop@hadoop01 hadoop-2.7.2]$ pwd

/opt/module/hadoop-2.7.2

        4.2 编辑 /etc/profile 文件

[hadoop@hadoop01 hadoop-2.7.2]$ vi /etc/profile

添加:

# Hadoop

export HADOOP_HOME=/opt/module/hadoop-2.7.2

export PATH=$PATH:$HADOOP_HOME/bin

export PATH=$PATH:$HADOOP_HOME/sbin

        4.3 让修改后的 /etc/profile文件生效

  [hadoop@hadoop01 hadoop-2.7.2]$ source /etc/profile

(5)Hadoop目录结构

drwxr-xr-x. 2 hadoop hadoop   194 May 22  2017    bin

drwxr-xr-x. 3 hadoop hadoop    20 May 22  2017    etc

drwxr-xr-x. 2 hadoop hadoop   106 May 22  2017    include

drwxr-xr-x. 3 hadoop hadoop    20 May 22  2017    lib

drwxr-xr-x. 2 hadoop hadoop   239 May 22  2017    libexec

-rw-r--r--. 1 hadoop hadoop 15429 May 22  2017   LICENSE.txt

-rw-r--r--. 1 hadoop hadoop   101 May 22  2017    NOTICE.txt

-rw-r--r--. 1 hadoop hadoop  1366 May 22  2017   README.txt

drwxr-xr-x. 2 hadoop hadoop  4096 May 22  2017   sbin

drwxr-xr-x. 4 hadoop hadoop    31 May 22  2017    share

bin目录:存放对Hadoop相关服务(HDFS,YARN)进行操作的脚本

etc目录:Hadoop的配置文件目录,存放Hadoop的配置文件

lib目录:存放Hadoop的本地库(对数据进行压缩解压缩功能)

sbin目录:存放启动或停止Hadoop相关服务的脚本

share目录:存放Hadoop的依赖jar包、文档、和官方案例

四、配置集群

hadoop01(192.168.107.2)

hadoop02(192.168.107.3)

hadoop03(192.168.107.4)

NameNode

Resourcemanager

SecondaryNameNode

DataNode

DataNode

DataNode

NodeManager

NodeManager

NodeManager



  [hadoop@hadoop01 ~]$ cd /opt/module/hadoop-2.7.2/etc/hadoop/

(1)配置HDFS相关文件

配置core-site.xml

[hadoop@hadoop01 hadoop]$ vi core-site.xml

在该文件中编写如下配置:

<configuration>

  <property>    <!-- 指定HDFS中NameNode的地址 -->

     <name>fs.defaultFS</name>

     <value>hdfs://hadoop01:9000</value>

  </property>

  <property>    <!-- 指定Hadoop运行时产生文件的存储目录 -->

     <name>hadoop.tmp.dir</name>

     <value>/opt/module/hadoop-2.7.2/data/tmp</value>

  </property>

</configuration>

配置hadoop-env.sh

[hadoop@hadoop01 hadoop]$ vi hadoop-env.sh

export JAVA_HOME=/opt/module/jdk1.8.0_144

配置hdfs-site.xml

[hadoop@hadoop01 hadoop]$ vi hdfs-site.xml

  在该文件中编写如下配置:

 <configuration>

    <property>  <!--设置备份数量-->

     <name>dfs.replication</name>

     <value>3</value>

    </property>

    <property>  <!-- 指定HDFS中SecondaryNameNode的配置 -->

     <name>dfs.namenode.secondary.http-address</name>

     <value>hadoop03:9001</value>

    </property>

    <property>

     <name>dfs.permissions.enabled</name>

     <value>false</value>

    </property>

    <property> 

     <name>dfs.datanode.max.xcievers</name>

     <value>4096</value>

     <dedication> Datanode 有一个同时处理文件的上限,至少要有4096</dedication>

    </property>

    <property>  <!--设置为true,可以在浏览器中IP+port查看-->

     <name>dfs.webhdfs.enabled</name>

     <value>true</value>

    </property>

</configuration>

 (2)配置MapReduce

配置mapred-env.sh

[hadoop@hadoop01 hadoop]$ vi mapred-env.sh

export JAVA_HOME=/opt/module/jdk1.8.0_144

  配置mapred-site.xml

    [hadoop@hadoop01 hadoop]$ cp mapred-site.xml.template mapred-site.xml

    [hadoop@hadoop01 hadoop]$ vi mapred-site.xml

    在该文件中编写如下配置:

<configuration>

    <property>

        <name>mapreduce.framework.name</name>

        <value>yarn</value>

    </property>

    <property>  <!--配置实际的主机名和端口-->

       <name>mapreduce.jobhistory.address</name>

       <value>hadoop02:10020</value>

    </property>

    <property>

       <name>mapreduce.jobhistory.webapp.address</name>

       <value>hadoop02:19888</value>

    </property>

</configuration>

(3)配置Yarn

配置yarn-env.sh

[hadoop@hadoop01 hadoop]$ vi yarn-env.sh

export JAVA_HOME=/opt/module/jdk1.8.0_144

配置yarn-site.xml

[hadoop@hadoop01 hadoop]$ vi yarn-site.xml

在该文件中编写如下配置:

<configuration>

    <property>  <!-- Reducer获取数据的方式 -->

        <name>yarn.nodemanager.aux-services</name>

        <value>mapreduce_shuffle</value>

    </property>

    <property>

       <name>yarn.log-aggregation-enable</name>

       <value>true</value>

    </property>

    <property>  <!--日志保存时间 默认保存3-7-->  

       <name>yarn.log-aggregation.retain-seconds</name>

       <value>86400</value>

    </property>

  <property>    <!-- 指定YARN的ResourceManager的地址 -->

     <name>yarn.resourcemanager.hostname</name>

     <value>hadoop02</value>

  </property>

  <property> <!--ResourceManager 对外web暴露的地址,可在浏览器查看-->  

     <name>yarn.resourcemanager.webapp.address</name>

     <value>hadoop02:8088</value>

  </property>

  <property> <!--ResourceManager 对ApplicationMaster暴露的地址--> 

     <name>yarn.resourcemanager.scheduler.address</name>

     <value>hadoop02:8030</value>

  </property>

  <property> <!--ResourceManager 对NodeManager暴露的地址-->

     <name>yarn.resourcemanager.resource-tracker.address</name> 

     <value>hadoop02:8031</value>

  </property>

  <property> <!--ResourceManager 对客户端暴露的地址-->

     <name>yarn.resourcemanager.address</name>

     <value>hadoop02:8032</value>

  </property>

  <property> <!--ResourceManager 对管理员暴露的地址-->

     <name>yarn.resourcemanager.admin.address</name>  

     <value>hadoop02:8033</value>

  </property>

</configuration>

(4)配置slaves

[hadoop@hadoop01 hadoop]$ vi slaves

在该文件中增加如下内容:

Hadoop01

Hadoop02

Hadoop03

(5)拷贝配置好的Hadoop到其他节点

[hadoop@hadoop01 ~]$ xsync /opt/module/hadoop-2.7.2/

(6)拷贝 /etc/profile 文件到其他节点

[hadoop@hadoop01 ~]$ su root

Password:

[root@hadoop01 hadoop]# /home/hadoop/bin/xsync /etc/profile

五、启动关闭集群

(1)如果集群是第一次启动,需要格式化NameNode(注意格式化之前,一定要先停止上次启动的所有namenode和datanode进程,然后再删除data和log数据)

[hadoop@hadoop01 hadoop-2.7.2]$ bin/hdfs namenode -format

(2) 启动HDFS  (在NameNode节点启动)

[hadoop@hadoop01 hadoop-2.7.2]$ sbin/start-dfs.sh

[hadoop@hadoop01 hadoop-2.7.2]$ jps

3716 DataNode

3941 Jps

3574 NameNode

(3)启动Yarn (在Resourcemanager节点启动)

[hadoop@hadoop02 hadoop-2.7.2]$ sbin/start-yarn.sh

[hadoop@hadoop02 hadoop-2.7.2]$ jps

3876 Jps

3271 DataNode

3422 ResourceManager

3535 NodeManager

启动顺序:HDFS > Yarn

关闭顺序:Yarn > HDFS

启动HDFS:start-dfs.sh

启动Yarn:start-yarn.sh

关闭HDFS:stop-dfs.sh

关闭Yarn:stop-yarn.sh

Web查看HDFS:http://192.168.107.2:50070

Web查看SecondaryNameNode:http://192.168.107.4:9001/status.html

Web查看Yarn:http://192.168.107.3:8088

 

原文地址:https://www.cnblogs.com/smandar/p/13901448.html