hadoop

HDFS分布式文件系统

环境准备:

1.安装java环境

 yum -y install java-1.8.0-openjdk-devel

2.配置/etc/hosts

3.配置ssh信任关系(NameNode)

rm -rf /root/.ssh/known_hosts 
# 配置/etc/ssh/ssh_config    取消yes询问
Host * StrictHostKeyChecking no

# 生成密钥对
ssh-kengen -b 2048 -t rsa -N '' -f key

# 部署
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.1.24

配置文件   (参考 :  https://hadoop.apache.org/docs/r2.7.6/)

环境配置文件  /usr/local/hadoop/etc/hadoop/hadoop-env.sh

 

核心配置文件  /usr/local/hadoop/etc/hadoop/core-site.xml

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <!-- <value>file:///</value>   使用本地文件系统 -->
                <value>hdfs://nn01:9000</value> <!-- # 使用hdfs文件系统 -->
        </property>
        <property>
                <!-- 数据存放目录 -->
                <name>hadoop.tmp.dir</name>
                <value>/var/hadoop</value>
        </property>
</configuration>

HDFS配置文件 /usr/local/hadoop/etc/hadoop/hdfs-site.xml

<configuration>
        <property>
                <!-- namenode  address and port -->
                <name>dfs.namenode.http-address</name>
                <value>nn01:50070</value>
        </property>
        <property>
                <!-- secondary address and port -->
                <name>/menode.secondary.http-address</name>
                <value>nn01:50090</value>
        </property>
        <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>
</configuration>

节点配置文件  /usr/local/hadoop/etc/hadoop/slaves

node1
node2
node3

启动hdfs集群

ALL: 创建数据存储文件    mkdir /var/hadoop

拷贝nn01:/usr/local/hadoop 至所有的node节点

rsync -aSH --delete /usr/local/hadoop node1:/usr/local/
rsync -aSH --delete /usr/local/hadoop node2:/usr/local/
...

在namenode上执行格式化操作

/usr/local/hadoop/bin/hdfs namenode -format

启动集群

/usr/local/hadoop/sbin/start-dfs.sh

所有节点jps验证角色

jps

namenode上节点验证

/usr/local/hadoop/bin/hdfs dfsadmin -report

配置/usr/local/hadoop/etc/hadoop/mapred-site.xml

<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>

配置/usr/local/hadoop/etc/hadoop/yarn-site.xml

<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>nn01</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>            

同步配置文件至node节点

rsync -aSH --delete /usr/local/hadoop/etc/hadoop/ node1:/usr/local/hadoop/etc/hadoop/

。。。

 

启动hadoop集群

/usr/local/hadoop/sbin/start-yarn.sh

验证

jps
/usr/local/hadoop/bin/yarn node -list           

web页面浏览

http://192.168.1.21:50070/                   # namenode   ip为设置的ip
http://192.168.1.21:50090/                   # secondarynamenode   ip为设置的ip
http://192.168.1.21:8088/                   # resourcemangager   ip为设置的ip
http://192.168.1.22:50075/                   # datanode   ip为设置的ip
http://192.168.1.22:8042/                   #nodemanager   ip为设置的ip
原文地址:https://www.cnblogs.com/ray-mmss/p/10451932.html