0003.搭建Hadoop的环境

03-01-Hadoop的目录结构和本地模式
03-02-配置Hadoop的伪分布模式

03-01-Hadoop的目录结构和本地模式

解压安装包

tar -zxvf hadoop-2.7.3.tar.gz -C /root/training	
	
tar -zxvf jdk-8u144-linux-x64.tar.gz -C /root/training	
	
tar -zxvf  apache-hive-2.3.0-bin.tar.gz -C /root/training		
	
tar -zxvf  hbase-1.3.1-bin.tar.gz -C /root/training

环境变量/etc/profile

JAVA_HOME=/root/training/jdk1.8.0_144
export PATH=$JAVA_HOME/bin:$PATH		

HADOOP_HOME=/root/training/hadoop-2.7.3
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

HBASE_HOME=/root/training/hbase-1.3.1
export HBASE_HOME

PATH=$HBASE_HOME/bin:$PATH
export PATH

HIVE_HOME=/root/training/apache-hive-2.3.0-bin
export HIVE_HOME

PATH=$HIVE_HOME/bin:$PATH
export PATH

使环境变量生效：

source /etc/profile

查看目录：

[rootebigdatalil training]# tree -d -L 2

-d 表示只查看目录
-L 查看深度为两级

Hadoop的目录结构.png

本地模式：

特点：没有HDFS，只能测试MapReduce程序（不是运行在Yarn中，做一个独立的Java程序来运行）

搭建步骤：修改 /root/training/hadoop-2.7.3/etc/hadoop/hadoop-env.sh

export JAVA_HOME=${JAVA_HOME}
改为
export JAVA_HOME=/root/training/jdk1.8.0_144

测试本地模式MapReduce程序

rm -rf * 表示删除当前目录下的所有文件。

root@ubuntu:~/temp# pwd
/root/temp
root@ubuntu:~/temp# nano data.txt
root@ubuntu:~/temp# nano data.txt
root@ubuntu:~/temp# cd /root/training/hadoop-2.7.3/share/hadoop/mapreduce
root@ubuntu:~/training/hadoop-2.7.3/share/hadoop/mapreduce# ls hadoop-mapreduce-examples-2.7.3.jar
hadoop-mapreduce-examples-2.7.3.jar
root@ubuntu:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount /root/temp/input/data.txt /root/temp/output/wc

查看结果：

root@ubuntu:~/training/hadoop-2.7.3/share/hadoop/mapreduce# cd /root/temp/output/wc
root@ubuntu:~/temp/output/wc# ls -al
total 20
drwxr-xr-x 2 root root 4096 Oct 16 11:17 .
drwxr-xr-x 3 root root 4096 Oct 16 11:17 ..
-rw-r--r-- 1 root root   55 Oct 16 11:17 part-r-00000
-rw-r--r-- 1 root root   12 Oct 16 11:17 .part-r-00000.crc
-rw-r--r-- 1 root root    0 Oct 16 11:17 _SUCCESS
-rw-r--r-- 1 root root    8 Oct 16 11:17 ._SUCCESS.crc
root@ubuntu:~/temp/output/wc# ls
part-r-00000  _SUCCESS
root@ubuntu:~/temp/output/wc# nano part-r-00000 
root@ubuntu:~/temp/output/wc# echo part-r-00000 
part-r-00000
root@ubuntu:~/temp/output/wc# cat part-r-00000 
Beijing 2
China   2
I       2
capital 1
is      1
love    2
of      1
the     1

查看结果.png

hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount /root/temp/input/data.txt /root/temp/output/wc	
	
其中/root/temp/input/data.txt 可以写目录，路径都是本地Linux的路径

03-02-配置Hadoop的伪分布模式

特点：在单机上，模拟一个分布式的环境，具备Hadoop的所有功能
HDFS：NameNode + DataNode + SecondaryNameNode
Yarn：ResourceManager + NodeManager

解压安装包

同上

环境变量/etc/profile

同上

配置文件.png

（1）修改 /root/training/hadoop-2.7.3/etc/hadoop/hadoop-env.sh

export JAVA_HOME=${JAVA_HOME}
改为
export JAVA_HOME=/root/training/jdk1.8.0_144

（2）hdfs-site.xml

<!--配置数据块的冗余度,默认是3-->
<!--原则冗余度跟数据节点个数保持一致,最大不要超过3-->
<property>	
<name>dfs.replication</name>
<value>1</value>
</property>

<!--是否开启HDFS的权限检查，默认是true-->
<!--使用默认值，后面会改为false-->
<!--
<property>	
<name>dfs.permissions</name>
<value>false</value>
</property>				
-->	
~~~
（3）core-site.xml
~~~
<!--配置HDFS主节点的地址，就是NameNode的地址-->
<!--9000是RPC通信的端口-->
<property>	
	<name>fs.defaultFS</name>
	<value>hdfs://192.168.16.143:9000</value>
</property>	

<!--HDFS数据块和元信息保存在操作系统的目录位置-->
<!--默认是Linux的tmp目录,一定要修改-->
<property>	
	<name>hadoop.tmp.dir</name>
	<value>/root/training/hadoop-2.7.3/tmp</value>
</property>
~~~
自己创建/root/training/hadoop-2.7.3/tmp目录
（4）mapred-site.xml（默认没有这个文件）
~~~
<!--MR程序运行容器或者框架-->
<property>	
	<name>mapreduce.framework.name</name>
	<value>yarn</value>
</property>	
~~~
（5）yarn-site.xml
~~~
<property>	
	<name>yarn.resourcemanager.hostname</name>
	<value>192.168.16.143</value>
</property>			


<property>	
	<name>yarn.nodemanager.aux-services</name>
	<value>mapreduce_shuffle</value>
</property>	
~~~
（6）对HDFS的NameNode进行格式化
~~~
命令：hdfs namenode -format
日志：Storage directory /root/training/hadoop-2.7.3/tmp/dfs/name has been successfully formatted.
~~~
（7）启动：
~~~
HDFS：start-dfs.sh
Yarn: start-yarn.sh
统一的：start-all.sh 
~~~

~~~
root@bigdata00:~/training/hadoop-2.7.3# jps
2690 NameNode
3219 ResourceManager
3544 NodeManager
3582 Jps
2863 DataNode
3071 SecondaryNameNode
root@bigdata00:~/training/hadoop-2.7.3# 
~~~

(8)web console 访问
Web Console访问：
hdfs: 端口: 192.168.16.143:50070 
yarn: 端口：192.168.16.143:8088

##### hdfs: 端口50070.png
![](0003.搭建Hadoop的环境.assets/50070.png)
##### yarn: 端口8088.png
![](0003.搭建Hadoop的环境.assets/8088.png)

-----------------------------------------------------------------
#### 03-03-免密码登录的原理和配置

ssh-keygen -t rsa

ssh-copy-id -i .ssh/id_rsa.pub root@192.168.16.143

~~~
root@bigdata00:~# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
9b:77:c7:5c:ef:b3:85:ac:61:24:4d:30:dc:1c:f3:ed root@bigdata00
The key's randomart image is:
+--[ RSA 2048]----+
|         .ooo.   |
|          .ooo . |
|            . . .|
|           o   . |
|        S . o   E|
|         o o + o.|
|        o . + * o|
|         . o + o.|
|            .  .+|
+-----------------+
root@bigdata00:~# ls
tools  training
root@bigdata00:~# ls -al
total 40
drwx------  7 root root 4096 Oct 16 12:19 .
drwxr-xr-x 23 root root 4096 Oct 15 20:44 ..
-rw-------  1 root root   55 Oct 16 08:11 .Xauthority
-rw-r--r--  1 root root 3106 Apr 19  2012 .bashrc
drwx------  2 root root 4096 Oct 16 07:46 .cache
drwxr-xr-x  2 root root 4096 Oct 16 12:15 .oracle_jre_usage
-rw-r--r--  1 root root  140 Apr 19  2012 .profile
drwx------  2 root root 4096 Oct 16 12:42 .ssh
drwxr-xr-x  2 root root 4096 Oct 16 07:57 tools
drwxr-xr-x  6 root root 4096 Oct 16 11:39 training
root@bigdata00:~# cd .ssh
root@bigdata00:~/.ssh# ls -al
total 20
drwx------ 2 root root 4096 Oct 16 12:42 .
drwx------ 7 root root 4096 Oct 16 12:19 ..
-rw------- 1 root root 1675 Oct 16 12:42 id_rsa
-rw-r--r-- 1 root root  396 Oct 16 12:42 id_rsa.pub
-rw-r--r-- 1 root root  666 Oct 16 12:20 known_hosts
root@bigdata00:~/.ssh# cd ..
root@bigdata00:~# ssh-copy-id -i .ssh/id_rsa.pub root@192.168.16.143
root@192.168.16.143's password: 
Now try logging into the machine, with "ssh 'root@192.168.16.143'", and check in:

  ~/.ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

root@bigdata00:~# ls .ssh/
authorized_keys  id_rsa  id_rsa.pub  known_hosts
root@bigdata00:~# more .ssh/authorized_keys 
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC/XCppmAEL6AnXoYXlmTr639AupthLny6JQ4zF9Jpg
S4mhycZCrHpVCxhERV9p+HzNFPRZBaWluseCOkzbAXbmMsXSucXcrbV+wyg0el+CHuDopJZ4JiAPjK8t
AnSPK1bdggCAVGaI138pU81YMgOntX3gV49CcIEGx9KFF4wLaPMq/PJrr9+omYhkTF50i+oHwl+bG2DL
GZFmJuk3nxF+rsGEHwdDCfBtcoa1f7Si4BA7gf0dEXBlydPMeYM48rgK0XAgNReBZJWBTooGkSXuxHy1
jccIiwH9G+mlZI38WI7YRIx6HZIwzfpG8yVTXahdPamC2MJ+w54dj0jKyVUL root@bigdata00
root@bigdata00:~# ssh 192.168.16.143
Welcome to Ubuntu 12.04.4 LTS (GNU/Linux 3.11.0-15-generic x86_64)

 * Documentation:  https://help.ubuntu.com/

  System information as of Fri Oct 16 12:47:00 CST 2020

  System load:  0.0               Processes:           395
  Usage of /:   41.5% of 6.50GB   Users logged in:     2
  Memory usage: 74%               IP address for eth0: 192.168.16.143
  Swap usage:   0%

  Graph this data and manage this system at:
    https://landscape.canonical.com/

0 packages can be updated.
0 updates are security updates.

Last login: Fri Oct 16 08:11:46 2020 from 192.168.16.1
root@bigdata00:~# stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [192.168.16.143]
192.168.16.143: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
no proxyserver to stop
root@bigdata00:~# start-all.sh 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [192.168.16.143]
192.168.16.143: starting namenode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-namenode-bigdata00.out
localhost: starting datanode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-datanode-bigdata00.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-bigdata00.out
starting yarn daemons
starting resourcemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-resourcemanager-bigdata00.out
localhost: starting nodemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-nodemanager-bigdata00.out

~~~

##### 免密码登录的原理.png
![](0003.搭建Hadoop的环境.assets/免密码登录的原理.png)

##### 伪分布模式wordcount
主要命令：
~~~
hdfs dfs -put data.txt /input

cd /root/training/hadoop-2.7.3/share/hadoop/mapreduce

hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount /input/data.txt /output/wc2
其中/input/data.txt /output/wc2 为hdfs 地址，其中/output/wc2不能事先存在。
~~~

~~~
root@bigdata00:~# jps
1710 Jps
root@bigdata00:~# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [192.168.16.143]
192.168.16.143: starting namenode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-namenode-bigdata00.out
localhost: starting datanode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-datanode-bigdata00.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-bigdata00.out
starting yarn daemons
starting resourcemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-resourcemanager-bigdata00.out
localhost: starting nodemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-nodemanager-bigdata00.out
root@bigdata00:~# jps
2291 SecondaryNameNode
1894 NameNode
2887 Jps
2602 NodeManager
2447 ResourceManager
2047 DataNode
root@bigdata00:~# hdfs dfs -ls /
root@bigdata00:~# hdfs dfs -mkdir /input
root@bigdata00:~# hdfs dfs -ls /
Found 1 items
drwxr-xr-x   - root supergroup          0 2020-10-16 13:35 /input
root@bigdata00:~# cd /root/temp/input
root@bigdata00:~/temp/input# ls
data.txt
root@bigdata00:~/temp/input# hdfs dfs -put data.txt /input
root@bigdata00:~/temp/input# hdfs dfs -ls /input
Found 1 items
-rw-r--r--   1 root supergroup         60 2020-10-16 13:36 /input/data.txt
root@bigdata00:~/temp/input# cd /root/training/hadoop-2.7.3/share/hadoop/mapreduce
root@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount /input/data.txt /output/wc2
20/10/16 13:37:52 INFO client.RMProxy: Connecting to ResourceManager at /192.168.16.143:8032
20/10/16 13:37:54 INFO input.FileInputFormat: Total input paths to process : 1
20/10/16 13:37:54 INFO mapreduce.JobSubmitter: number of splits:1
20/10/16 13:37:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1602826392719_0001
20/10/16 13:37:55 INFO impl.YarnClientImpl: Submitted application application_1602826392719_0001
20/10/16 13:37:55 INFO mapreduce.Job: The url to track the job: http://192.168.16.143:8088/proxy/application_1602826392719_0001/
20/10/16 13:37:55 INFO mapreduce.Job: Running job: job_1602826392719_0001
20/10/16 13:38:13 INFO mapreduce.Job: Job job_1602826392719_0001 running in uber mode : false
20/10/16 13:38:13 INFO mapreduce.Job:  map 0% reduce 0%
20/10/16 13:38:25 INFO mapreduce.Job:  map 100% reduce 0%
20/10/16 13:38:34 INFO mapreduce.Job:  map 100% reduce 100%
20/10/16 13:38:36 INFO mapreduce.Job: Job job_1602826392719_0001 completed successfully
20/10/16 13:38:36 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=93
                FILE: Number of bytes written=237535
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=166
                HDFS: Number of bytes written=55
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=9394
                Total time spent by all reduces in occupied slots (ms)=6930
                Total time spent by all map tasks (ms)=9394
                Total time spent by all reduce tasks (ms)=6930
                Total vcore-milliseconds taken by all map tasks=9394
                Total vcore-milliseconds taken by all reduce tasks=6930
                Total megabyte-milliseconds taken by all map tasks=9619456
                Total megabyte-milliseconds taken by all reduce tasks=7096320
        Map-Reduce Framework
                Map input records=3
                Map output records=12
                Map output bytes=108
                Map output materialized bytes=93
                Input split bytes=106
                Combine input records=12
                Combine output records=8
                Reduce input groups=8
                Reduce shuffle bytes=93
                Reduce input records=8
                Reduce output records=8
                Spilled Records=16
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=270
                CPU time spent (ms)=3420
                Physical memory (bytes) snapshot=286212096
                Virtual memory (bytes) snapshot=4438401024
                Total committed heap usage (bytes)=138043392
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=60
        File Output Format Counters 
                Bytes Written=55
root@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hadoop jar hadooproot@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# cd /root/trainingroot@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hdfs dfs -ls /inproot@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hdfs dfs -ls /inproot@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hdfs dfs -ls /in
root@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hdfs dfs -ls /output
Found 1 items
drwxr-xr-x   - root supergroup          0 2020-10-16 13:38 /output/wc2
root@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hdfs dfs -ls /output/wc2
Found 2 items
-rw-r--r--   1 root supergroup          0 2020-10-16 13:38 /output/wc2/_SUCCESS
-rw-r--r--   1 root supergroup         55 2020-10-16 13:38 /output/wc2/part-r-00000

root@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hdfs dfs -cat  /output/wc2/part-r-00000
Beijing 2
China   2
I       2
capital 1
is      1
love    2
of      1
the     1

~~~

-----------------------------------------------------------------
#### 03-04-搭建Hadoop的全分布模式

##### SecureCRT同时给多个Session.png
![](0003.搭建Hadoop的环境.assets/SecureCRT同时给多个Session.png)
##### ##### 设置主机名和IP  nano /etc/hosts

##### 至少需要3台机器集群的规划
192.168.16.141 bigdata01 NameNode + SecondaryNameNode + ResourceManager
192.168.16.138 bigdata02 DataNode + NodeManager
192.168.16.139 bigdata03 DataNode + NodeManager

##### 配置免密码登录：两两之间的免密码登录
ssh-keygen -t rsa

ssh-copy-id -i .ssh/id_rsa.pub root@192.168.16.141
ssh-copy-id -i .ssh/id_rsa.pub root@192.168.16.138
ssh-copy-id -i .ssh/id_rsa.pub root@192.168.16.139

##### 配置

###### 全分布模式的主节点配置.png
![](0003.搭建Hadoop的环境.assets/全分布模式的配置.png)
0. 解压java、hadoop
1. 3 台的java、hadoop环境变量，使生效
2. hadoop-env.sh
3. hdfs-site.xml
4. core-site.xml
5. mapred-site.xml
6. yarn-site.xml
7. slaves 配置从节点地址
	* 192.168.16.138
	* 192.168.16.139
8. 对namenode进行格式化
	* hdfs namenode -format
9. 把192.168.16.141上安装好的目录复制到从节点上
	* scp -r /root/training/jdk1.8.0_144 root@192.168.16.138:/root/training
	* scp -r /root/training/jdk1.8.0_144 root@192.168.16.139:/root/training
	* scp -r /root/training/hadoop-2.7.3/ root@192.168.16.138:/root/training
	* scp -r /root/training/hadoop-2.7.3/ root@192.168.16.139:/root/training
10.start-all.sh

~~~
[ rootebigdatal12 training]# start-all.sh 
This script is Deprecated. Instead use start-dfs. sh and start-yarn. sh 
Starting namenodes on [ bigdatal12]
bigdatal12: starting namenode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-namenode-bigdatal12. out
bigdatal13: starting datanode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-datanode-bigdatal13. out
bigdatal14: starting datanode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-datanode-bigdatal14. out 
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-bigdatal12. out
starting yarn daemons 
starting resourcemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-resourcemanager-bigdatal12. out bigdatal14: starting nodemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-nodemanager-bigdatal14. outbigdatal13: starting nodemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-nodemanager-bigdatal13. out
~~~
~~~
[root@bigdatal12 training]#jps
13254 NameNode
13433 SecondaryNameNode
13578 ResourceManager
13835 Jps
~~~
~~~
[rootebigdata113 training]# jps
11847 DataNode
11943 Nodelanager
12043 Jps
~~~
~~~
[root@bigdata114 training# jps
11744 Jps
11548 Datalode
11644 Nodelanager
~~~
-----------------------------------------------------------------
#### 03-05-主从结构的单点故障