hadoop平台搭建

所以机子需同一个用户名
安装Ubuntu系统,要联网安装,可安装必要的插件等。

一.Jdk安装与配置:

Jdk解压后:sudo gedit /etc/profile  在末尾加入一下内容

#SET JAVA ENVIROMENT
export JAVA_HOME=/home/ubuntu/jdk
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

1.安装rpm:sudo apt-get install rpm

 Hadoop安装与配置:
 sudo gedit /etc/profile  在末尾加入一下内容:

#hadoop variable settings
export HADOOP_PREFIX=/home/ubuntu/hadoop
export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin
export HADOOP_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native-x64
export HADOOP_CONF_DIR=${HADOOP_PREFIX}/etc/hadoop
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_YARN_HOME=${HADOOP_PREFIX}
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib/native-64 -Djava.net.preferIPv4Stack=true"

Hadoop解压后在hadoop/etc/hadoop目录下配置:
hadoop-env.sh

      export JAVA_HOME=/home/ubuntu/jdk

 

2.yarn-env.sh

export JAVA_HOME=/home/ubuntu/jdk

 

3.core-site.xml

<property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:9000</value>
</property>
<property>
        <name>io.file.buffer.size</name>
        <value>131072</value>
</property>

<property>
        <name>hadoop.tmp.dir</name>
        <value>file:/home/bigdata/tmp</value>
</property>

<property>
        <name>hadoop.native.lib</name>
        <value>true</value>
        <description>Should native hadoop libraries, if present, be used.</description>
</property>

<property>
        <name>fs.checkpoint.period</name>
        <value>3600</value>
        <description>The number of seconds between two periodic checkpoints.
        </description>
</property>
<property>
        <name>fs.checkpoint.size</name>
        <value>67108864</value>
        <description>
                the size of the current edit log (int bytes) that triggers
                a periodic checkpoint even if the fs.checkpoint.
                period hasn't expried.
        </description>
</property>

<property>
        <name>fs.checkpoint.dir</name>
        <value>${hadoop.tmp.dir}/dfs/namesecondary</value>
        <description>
                Determines where on the local filesystem the DFS secondary
                 name node should store the temporary images to merge.
                 If this is a comma-delimited list of directories then the image is
                 replicated in all of the directories for redundancy.
         </description>
</property>

<!--    <property>
        <name>topology.script.file.name</name>
        <value>/hadoop-script/rack.py</value>
</property> -->

需要本地创建tmp文件夹

 

4. hdfs-site.xml

        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file:/home/bigdata/hdfs/name</value>
                <final>true</final>
        </property>

        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:/home/bigdata/hdfs/data</value>
        </property>

        <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>

        <property>
                <name>dfs.webhdfs.enabled</name>
                <value>true</value>
        </property>

        <property>
                <name>dfs.permissions</name>
                <value>false</value>
        </property>

        <property>
                <name>dfs.http.address</name>
                <value>master:50070</value>
                <description>
                The address and the base port where the dfs namenode web ui will listen on.
                If the port is 0 then the server will start on a free port.
                </description>
        </property>

        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>master:50090</value>
        </property>

需要本地创建 hdfs文件夹 name 和data文件夹不能创建,为自动生成,创建会出错,
如更改配置后,需重新格式化namenode 则需要删除hdfs下的文件
Master删name    Slave删data

 

5.mapred-site.xml(从mapred-site.xml.template拷贝黏贴改名)

 <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
        <final>true</final>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>master:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>master:19888</value>
    </property>

     需要本地创建var文件夹

6.yarn-site.xml

 <property>   
        <name>yarn.nodemanager.aux-services</name>   
        <value>mapreduce_shuffle</value>
    </property>   
    <property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>   
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>     
    </property>       

 

7.Slaves

增加如: slave1
            Slave2

 

三.更改hosts文件

Sudo gedit  /etc/hosts

增加

10.9.201.129  master

10.9.51.29    slave1

 

四.更改hostname文件

Sudo gedit /etc/hostname

根据需要修改,如:master    slave1

 

 

五.安装ssh 免密码登录

1.安装ssh

   Sudo apt-get install ssh

2.创建公钥与私钥

      ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

 

3.追加到授权keys

  Cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

 

4.在每台机器

  chmod 755 ~/.ssh/

  chmod 644 authorized_keys

 

 

六.将authorized_keys分发到slave主机中(在.ssh文件夹中)

  Scp authorized_keys slave1:~/.ssh/

 

七.Hadoop格式化namenode

bin/hdfs namenode -format

如果出现权限问题,则更改文件夹权限

Chmod 777 -R hadoop

        Hdfs

        Var

        Tmp

 

 

八.启动hadoop

sbin/start-dfs.sh

可以在master结点上看到如下几个进程:

haduser@master:~/hadoop/hadoop-2.2.0$ jps
6638 Jps
6015 NameNode
6525 SecondaryNameNode

在slave结点上看到如下进程

haduser@slave1:~/hadoop/hadoop-2.2.0/etc/hadoop$ jps
4264 Jps
4208 DataNode

 启动YARN集群

   sbin/start-yarn.sh

 

 

八.启动hadoop

sbin/start-dfs.sh

可以在master结点上看到如下几个进程:

haduser@master:~/hadoop/hadoop-2.2.0$ jps
6638 Jps
6015 NameNode
6525 SecondaryNameNode

在slave结点上看到如下进程

haduser@slave1:~/hadoop/hadoop-2.2.0/etc/hadoop$ jps
4264 Jps
4208 DataNode

 启动YARN集群

 sbin/start-yarn.sh

 

 

九.Ecplise 配置:

1.将插件放入ecplise目录下的plugin中

2.windows -preference-mapred 配置路径

3.在console新建mapred标签,配置相关。

 

原文地址:https://www.cnblogs.com/xmeo/p/6498055.html