HDFS的搭建

所有的节点都必须做的:(NameNode,DataNode)

1 需要知道hadoop依赖Java和SSH

  1. Java 1.5.x (以上),必须安装。安装目录为/usr/java/jdk1.7.0

1 下载合适的jdk 

//此文件为64Linux 系统使用的 RPM包 

 http://download.oracle.com/otn-pub/java/jdk/7/jdk-7-linux-x64.rpm 

 

2 安装jdk 

rpm -ivh jdk-7-linux-x64.rpm 

 

3 验证java 

[root@hadoop1 ~]# java -version 

java version "1.7.0" 

Java(TM) SE Runtime Environment (build 1.7.0-b147) 

Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode) 

[root@hadoop1 ~]# ls /usr/java/ 

default  jdk1.7.0  latest 

 

4 配置java环境变量 

#vim /etc/profile //在profile文件中加入如下信息: 

 

#add for hadoop 

export JAVA_HOME=/usr/java/jdk1.7.0 

export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/ 

export PATH=$PATH:$JAVA_HOME/bin 

 

//使环境变量生效 

source /etc/profile 

 

5 拷贝 /etc/profile 到 datanode 

  1. ssh 必须安装并且保证 sshd 一直运行,以便用Hadoop 脚本管理远端Hadoop守护进程。
  2. 检验是否安装了SSH,执行命令:

                               which ssh

           which sshd

           which ssh-keygen

          如果上面三个命令的返回都不是空,则证明SSH已经安装好。

2 建立 Hadoop 公共帐号

  1. 所有的节点应该具有相同的用户名,可以使用如下命令添加:
  2. useradd hadoop
  3. passwd hadoop

 3配置 host 主机名

  tail -n 3 /etc/hosts

    192.168.57.75  namenode

    192.168.57.76  datanode1

    192.168.57.78  datanode2

    192.168.57.79  datanode3

在NameNode节点中需要进行的:

1.生成ssh密钥对。

在NameNode上执行下面的命令,生成RSA密钥对:

执行命令:

  ssh-keygen -t rsa

下面是从别的地方摘抄过来的:

大家可以配置成密论认证的方式

首先生成密钥,用命令ssh-keygen –t rsa

     运行后可以一直空格,生成密钥,id_rsa和id_rsa.pub文件 ,默认放在/root/.ssh/下,.ssh文件是隐藏的,要显示隐藏文件才看得到

     在/home/admin下创建.ssh活页夹,把id_rsa.pub文件copy 到/home/admin/.ssh活页夹下,改变文件名为authorized_keys

     

     把id_rsa 文件copy 到一个目录如/home/id_rsa

     用下面的命令测试配好了没:

     ssh  -i  /home/id_rsa admin@localhost

     应该不用密码就直接进去了~

2.查看生成的公钥:

more /home/root/.ssh/id_rsa.pub

3.将公钥复制到各个从节点上。

       1.在主节点上运行命令:scp /home/root/.ssh/id_rsa.pub  hadoop_dataNode@ip地址: ~/master_key   将生成的公钥文件从NameNode上面复制到DataNode中的                 "~/master_key"文件。

        2.在从节点DataNode之上设置该文件为授权密钥:

    mkdir ~/.ssh

           chmod 700 ~/.ssh

    mv ~/master_key ~/.ssh/authrized_keys

           chmod 600 ~/.ssh/authrized_keys

4.从主节点上访问从节点: ssh ip地址

hadoop配置(这个需要在所有的节点上配置,除了一些特殊的命令,有标注)

 hadoop 配置
//注意使用hadoop 用户 操作
1 配置目录
[hadoop@hadoop1 ~]$ pwd
/home/hadoop
[hadoop@hadoop1 ~]$ ll
total 59220
lrwxrwxrwx  1 hadoop hadoop       17 Feb  1 16:59 hadoop -> hadoop-0.20.203.0
drwxr-xr-x 12 hadoop hadoop     4096 Feb  1 17:31 hadoop-0.20.203.0
-rw-r--r--  1 hadoop hadoop 60569605 Feb  1 14:24 hadoop-0.20.203.0rc1.tar.gz
 
 
2 配置hadoop-env.sh,指定java位置
vim hadoop/conf/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0
 
3 配置core-site.xml //定位文件系统的 namenode
 
[hadoop@hadoop1 ~]$ cat hadoop/conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 
<!-- Put site-specific property overrides in this file. -->
 
<configuration>
 
<property>
<name>fs.default.name</name>
<value>hdfs://namenode的ip地址:9000</value>
</property>
 
</configuration>

hadoop.tmp.dir是hadoop文件系统依赖的基础配置,很多路径都依赖它。它默认的位置是在/tmp/{$user}下面,但是在/tmp路径下的存储是不安全的,因为linux一次重启,文件就可能被删除。
编辑conf/core-site.xml,在里面加上如下属性: 

<property>
    <name>hadoop.tmp.dir</name>
    <value>/home/had/hadoop/data</value>
   <description>A base for other temporary directories.</description>
</property>


4 配置mapred-site.xml //定位jobtracker 所在的主节点 (其实这个是map-reduce的缩写)
 
[hadoop@hadoop1 ~]$ cat hadoop/conf/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 
<!-- Put site-specific property overrides in this file. -->
 
<configuration>
 
<property>
<name>mapred.job.tracker</name>
<value>namenode:9001</value>
</property>
 
</configuration>
 
5 配置hdfs-site.xml //配置HDFS副本数量
 
[hadoop@hadoop1 ~]$ cat hadoop/conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 
<!-- Put site-specific property overrides in this file. -->
 
<configuration>
 
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
 
</configuration>
 
6 配置 master 与 slave 配置文档
[hadoop@hadoop1 ~]$ cat hadoop/conf/masters
namenode
[hadoop@hadoop1 ~]$ cat hadoop/conf/slaves
datanode1
datanode2
 
7 拷贝hadoop 目录到所有节点(datanode)
[hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode1:/home/hadoop/
[hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode2:/home/hadoop/
[hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode3:/home/hadoop
 
8 格式化 HDFS (这个只需要在NameNode上进行设置,并且最好设置一下)
[hadoop@hadoop1 hadoop]$ bin/hadoop namenode -format
12/02/02 11:31:15 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = hadoop1.test.com/127.0.0.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.20.203.0
STARTUP_MSG:   build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May  4 07:57:50 PDT 2011
************************************************************/
Re-format filesystem in /tmp/hadoop-hadoop/dfs/name ? (Y or N)  Y  //这里输入Y
12/02/02 11:31:17 INFO util.GSet: VM type       = 64-bit
12/02/02 11:31:17 INFO util.GSet: 2% max memory = 19.33375 MB
12/02/02 11:31:17 INFO util.GSet: capacity      = 2^21 = 2097152 entries
12/02/02 11:31:17 INFO util.GSet: recommended=2097152, actual=2097152
12/02/02 11:31:17 INFO namenode.FSNamesystem: fsOwner=hadoop
12/02/02 11:31:18 INFO namenode.FSNamesystem: supergroupsupergroup=supergroup
12/02/02 11:31:18 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/02/02 11:31:18 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
12/02/02 11:31:18 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
12/02/02 11:31:18 INFO namenode.NameNode: Caching file names occuring more than 10 times
12/02/02 11:31:18 INFO common.Storage: Image file of size 112 saved in 0 seconds.
12/02/02 11:31:18 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
12/02/02 11:31:18 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop1.test.com/127.0.0.1
************************************************************/
[hadoop@hadoop1 hadoop]$
 
9 启动hadoop 守护进程
[hadoop@hadoop1 hadoop]$ bin/start-all.sh
starting namenode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-namenode-hadoop1.test.com.out
datanode1: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop2.test.com.out
datanode2: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop3.test.com.out
datanode3: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop4.test.com.out
starting jobtracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-jobtracker-hadoop1.test.com.out
datanode1: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop2.test.com.out
datanode2: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop3.test.com.out
datanode3: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop4.test.com.out
 
10 验证
//namenode
[hadoop@hadoop1 logs]$ jps
2883 JobTracker
3002 Jps
2769 NameNode
 
//datanode
[hadoop@hadoop2 ~]$ jps
2743 TaskTracker
2670 DataNode
2857 Jps
 
[hadoop@hadoop3 ~]$ jps
2742 TaskTracker
2856 Jps
2669 DataNode
 
[hadoop@hadoop4 ~]$ jps
2742 TaskTracker
2852 Jps
2659 DataNode
 
Hadoop 监控web页面
http://NameNode的ip地址:50070/dfshealth.jsp

原文地址:https://www.cnblogs.com/lxzh/p/3008319.html