hadoop 完全分布式部署

一.实验环境准备
　　需要准备四台Linux操作系统的服务器，配置参数最好一样，由于我的虚拟机是之前伪分布式部署而来的，因此我的环境都一致，并且每天虚拟机默认都是Hadoop伪分布式哟！
1>.NameNode服务器（172.20.20.228）

2>.DataNode服务器（172.20.20.226-220）

二.修改Hadoop的配置文件

　　修改的配置文件路径是我之前拷贝的full目录，绝对路径是：“/tosp/opt/hadoop”,修改这个目录下的文件之后，我们将hadoop目录连接过来即可，当你需要伪分布式或者本地模式的时候只需要改变软连接指向的目录即可，这样就轻松实现了三种模式配置文件和平相处的局面。

1>.core-site.xml 配置文件

[root@cdh14 ~]$ more /tosp/opt/hadoop/etc/hadoop/core-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
                <property>
                        <name>fs.defaultFS</name>
                        <value>hdfs://cdh14:9000</value>
                </property>
                <property>
                        <name>hadoop.tmp.dir</name>
                        <value>/tosp/opt/hadoop</value>
                </property>
</configuration>

<!--

core-site.xml配置文件的作用：
    用于定义系统级别的参数，如HDFS URL、Hadoop的临时
目录以及用于rack-aware集群中的配置文件的配置等，此中的参
数定义会覆盖core-default.xml文件中的默认配置。

fs.defaultFS 参数的作用：
        #声明namenode的地址，相当于声明hdfs文件系统。

hadoop.tmp.dir 参数的作用：
        #声明hadoop工作目录的地址。

-->
[root@cdh14 ~]$

2>.hdfs-site.xml 配置文件

[root@cdh14 ~]$ more /tosp/opt/hadoop/etc/hadoop/hdfs-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
        <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>
</configuration>

<!--
hdfs-site.xml 配置文件的作用：
        #HDFS的相关设定，如文件副本的个数、块大小及是否使用强制权限
等，此中的参数定义会覆盖hdfs-default.xml文件中的默认配置.

dfs.replication 参数的作用：
        #为了数据可用性及冗余的目的，HDFS会在多个节点上保存同一个数据
块的多个副本，其默认为3个。而只有一个节点的伪分布式环境中其仅用
保存一个副本即可，这可以通过dfs.replication属性进行定义。它是一个
软件级备份。

-->
[root@cdh14 ~]$

3>.mapred-site.xml 配置文件

[root@cdh14 ~]$ more /tosp/opt/hadoop/etc/hadoop/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>

<!--
mapred-site.xml 配置文件的作用：
        #HDFS的相关设定，如reduce任务的默认个数、任务所能够使用内存
的默认上下限等，此中的参数定义会覆盖mapred-default.xml文件中的
默认配置.

mapreduce.framework.name 参数的作用：
        #指定MapReduce的计算框架，有三种可选，第一种：local(本地),第
二种是classic(hadoop一代执行框架),第三种是yarn(二代执行框架)，我
们这里配置用目前版本最新的计算框架yarn即可。

-->
[root@cdh14 ~]$

4>.yarn-site.xml配置文件

[root@cdh14 ~]$ more /tosp/opt/hadoop/etc/hadoop/yarn-site.xml 
<?xml version="1.0"?>
<configuration>
                <property>
                        <name>yarn.resourcemanager.hostname</name>
                        <value>cdh14</value>
                </property>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
</configuration>

<!--

yarn-site.xml配置文件的作用：
        #主要用于配置调度器级别的参数.
yarn.resourcemanager.hostname 参数的作用：
        #指定资源管理器(resourcemanager)的主机名
yarn.nodemanager.aux-services 参数的作用：
        #指定nodemanager使用shuffle

-->
[root@cdh14 ~]$

5>.slaves配置文件

[root@cdh14 ~]$ more /tosp/opt/hadoop/etc/hadoop/slaves 
#该配置文件的作用：是NameNode用与记录需要连接哪些DataNode服务器节点，用与启动或停止服务时发送远程命令指令的目标主机。
cdh14
cdh12
cdh11
cdh10
cdh9
cdh8
cdh7
[root@cdh14 ~]$

三.在NameNode节点上配置免密码登录各DataNode节点

1>.在本地上生成公私秘钥对(生成之前，把上次部署伪分布式的秘钥删除掉)

[root@cdh14 ~]$ rm -rf ~/.ssh/*
[root@cdh14 ~]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Your identification has been saved in /home/root/.ssh/id_rsa.
Your public key has been saved in /home/root/.ssh/id_rsa.pub.
The key fingerprint is:
a3:a4:ae:d8:f7:7f:a2:b6:d6:15:74:29:de:fb:14:08 root@cdh14
The key's randomart image is:
+--[ RSA 2048]----+
|             .   |
|          E o    |
|         o = .   |
|          o o .  |
|      . S  . . . |
|     o . .. . .  |
|    . .. .   o   |
| o .. o o .   .  |
|. oo.+++.o       |
+-----------------+
[root@cdh14 ~]$

2>.使用ssh-copy-id命令分配公钥到DataNode服务器（172.20.20.228）

[root@cdh14 ~]$ ssh-copy-id root@cdh14
The authenticity of host 'cdh14 (172.16.30.101)' can't be established.
ECDSA key fingerprint is fa:25:bc:03:7e:99:eb:12:1e:bc:a8:c9:ce:39:ba:7b.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@cdh14's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'root@cdh14'"
and check to make sure that only the key(s) you wanted were added.

[root@cdh14 ~]$ ssh cdh14
Last login: Fri May 25 18:35:40 2018 from 172.16.30.1
[root@cdh14 ~]$ who
root pts/0        2018-05-25 18:35 (172.16.30.1)
root pts/1        2018-05-25 19:17 (cdh14)
[root@cdh14 ~]$ exit 
logout
Connection to cdh14 closed.
[root@cdh14 ~]$ who
root pts/0        2018-05-25 18:35 (172.16.30.1)
[root@cdh14 ~]$

3>.使用ssh-copy-id命令分配公钥到DataNode服务器（172.20.20.226-220）

[root@cdh14 ~]$ ssh-copy-id root@chd12-cdh7
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@s102's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'root@s102'"
and check to make sure that only the key(s) you wanted were added.

[root@cdh14 ~]$ ssh s102
Last login: Fri May 25 18:35:42 2018 from 172.16.30.1
[root@s102 ~]$ who
root pts/0        2018-05-25 18:35 (172.16.30.1)
root pts/1        2018-05-25 19:19 (cdh14)
[root@s102 ~]$ exit 
logout
Connection to s102 closed.
[root@cdh14 ~]$ who
root pts/0        2018-05-25 18:35 (172.16.30.1)
[root@cdh14 ~]$

　　注意：以上是普通使配置免密登录，root用户配置方法一致，最好也配置上root用户的免密登录，因为下文我会执行相应的shell脚本。

五.启动服务并验证是否成功

1>.格式化文件系统

root@cdh14 ~]$ hdfs namenode -format

2>.启动hadoop

[root@cdh14 ~]$ start-all.sh

3>.用自定义脚本验证NameNode和DataNode是否已经正常启动

[root@cdh14 ~]$ jps