八、Linux 上搭建 Hadoop 集群

参考:
https://www.cnblogs.com/yanshw/p/11535633.html
https://www.cnblogs.com/frankdeng/p/9047698.html

集群时间同步参考:
https://www.cnblogs.com/frankdeng/p/9005691.html

解压,配置环境变量,使生效
export HADOOP_HOME=/home/hadoop/hadoop-2.7.7
export PATH=$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin

修改配置文件
目录:/home/hadoop/hadoop-2.7.7/etc/hadoop

1、vi hadoop-env.sh
export JAVA_HOME=/home/hadoop/jdk1.8.0_131
2、vi mapred-env.sh
export JAVA_HOME=/home/hadoop/jdk1.8.0_131
3、vi yarn-env.sh
export JAVA_HOME=/home/hadoop/jdk1.8.0_131

4、vi slaves
storm-01
storm-02
storm-03

4个xml core、hdfs、mapred、yarn
5、vi core-site.xml
`
<configuration>
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://storm-01:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/dataDir/hadoop</value>
</property>
</configuration>
`

6、vi hdfs-site.xml
`
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- The secondary namenode http server address and port. -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>storm-03:50090</value>
</property>
<configuration>
`

7、mv mapred-site.xml.template mapred-site.xml
vi mapred-site.xml
`
<configuration>
<!-- 指定mr运行在yarn上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
`

8、vi yarn-site.xml
`
<configuration>
<!-- reducer获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>storm-02</value>
</property>
</configuration>
`

分发到其他机器

如果集群是第一次启动,需要格式化namenode
hdfs namenode -format 
启动Hdfs:
start-dfs.sh
启动Yarn: 注意:Namenode和ResourceManger如果不是同一台机器,不能在NameNode上启动 yarn,应该在ResouceManager所在的机器上启动yarn。
start-yarn.sh

1)各个服务组件逐一启动
分别启动hdfs组件: hadoop-daemon.sh start|stop namenode|datanode|secondarynamenode
启动yarn: yarn-daemon.sh start|stop resourcemanager|nodemanager
2)各个模块分开启动(配置ssh是前提)常用
start|stop-dfs.sh start|stop-yarn.sh
3)全部启动(不建议使用)
start|stop-all.sh

远程访问 hadoop 集群
namenode 的 IP :storm-01
50070 端口 访问 hdfs http://storm-01:50070
8088 端口 访问 mapreduce http://storm-01:8088

测试1:
hadoop fs -mkdir -p /test/hankang/20200609
hadoop fs -put test.txt /test/hankang/20200609
hadoop jar /home/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /test/hankang/20200609 /test/hankang/output/20200609

测试2:
hadoop fs -mkdir -p /test/hankang/20200609
hadoop fs -put /home/hadoop/hadoop-2.7.7/README.txt /test/hankang/20200609
hadoop jar /home/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /test/hankang/20200609/README.txt /test/hankang/output/20200609

============================================ 分界线 ===========================================================================
Hadoop重新格式化HDFS
参考:
https://www.jianshu.com/p/a4f4f57ad3d8
https://www.cnblogs.com/neo98/articles/6305999.html

每一次format主节点namenode,dfs/name/current目录下的VERSION文件会产生新的clusterID、namespaceID。但是如果子节点的dfs/name/current仍存在,hadoop格式化时就不会重建该目录,因此形成子节点的clusterID、namespaceID与主节点(即namenode节点)的clusterID、namespaceID不一致。最终导致hadoop启动失败。
同理,data也是如此。

hadoop.tmp.dir /tmp/hadoop-${user.name}
dfs.namenode.name.dir file://${hadoop.tmp.dir}/dfs/name
dfs.datanode.data.dir file://${hadoop.tmp.dir}/dfs/data


rm -rf /home/hadoop/dataDir/hadoop/*
修改配置
在namenode节点执行:
hdfs namenode -format

============================================ 分界线 ===========================================================================
查看Hadoop默认配置
参考:
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html#Configuring_Environment_of_Hadoop_Daemons
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml

hadoop.tmp.dir /tmp/hadoop-${user.name}
dfs.namenode.name.dir file://${hadoop.tmp.dir}/dfs/name
dfs.datanode.data.dir file://${hadoop.tmp.dir}/dfs/data

原文地址:https://www.cnblogs.com/tianxiu/p/13139560.html