搭建基于ubuntu14.04麒麟的hadoop单机测试环境

使用hadoop版本:2.2.0

安装下载啥的就不嘀咕了,直接从配置开始:

hadoop需要配置的有以下几个文件,都在$HADOOP_HOME/etc/hadoop/:

hadoop-env.sh:里面有个JAVA_HOME的,配置到JDK的位置

core-site.xml:将以下代码插入到configuration中间

<property>

  <name>hadoop.tmp.dir</name>

 <value>/home/username/kit/hadoop/data/temp/</value>

</property>

<property>

 <name>fs.default.name</name>

 <value>hdfs://localhost:9000</value>
 <final>true</final>

</property>

hdfs-site.xml:代码如下:

<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/username/kit/hadoop/namenode/</value>
<final>true</final>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/username/kit/hadoop/datanode/</value>
<final>true</final>
</property>

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>

mapred-site.xml:这个是复制一个mapred-site.xml.template,然后改名,然后写入如下代码:

  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>

yarn-site.xml:这个略多,有些可能不必要,从别处抄的,就全加上了

<property>
      <name>yarn.resourcemanager.hostname</name>
      <value>localhost</value>
      <description>hostanem of RM</description>
    </property>


    <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>localhost:5274</value>
    <description>host is the hostname of the resource manager and 
    port is the port on which the NodeManagers contact the Resource Manager.
    </description>
  </property>

  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>localhost:5273</value>
    <description>host is the hostname of the resourcemanager and port is the port
    on which the Applications in the cluster talk to the Resource Manager.
    </description>
  </property>

  <property>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
    <description>In case you do not want to use the default scheduler</description>
  </property>

  <property>
    <name>yarn.resourcemanager.address</name>
    <value>localhost:5271</value>
    <description>the host is the hostname of the ResourceManager and the port is the port on
    which the clients can talk to the Resource Manager. </description>
  </property>

  <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value></value>
    <description>the local directories used by the nodemanager</description>
  </property>

  <property>
    <name>yarn.nodemanager.address</name>
    <value>localhost:5272</value>
    <description>the nodemanagers bind to this port</description>
  </property>  

  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>10240</value>
    <description>the amount of memory on the NodeManager in GB</description>
  </property>
 
  <property>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/app-logs</value>
    <description>directory on hdfs where the application logs are moved to </description>
  </property>

   <property>
    <name>yarn.nodemanager.log-dirs</name>
    <value></value>
    <description>the directories used by Nodemanagers as log directories</description>
  </property>

  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    <description>shuffle service that needs to be set for Map Reduce to run </description>
  </property>

把这几个文件配置好后,基本就大功告成了。

如果系统是64位的,需要将$HADOOP_HOME/lib/native/的文件替换为64位版本的,这个可以自己下载源码编译,具体请百度搜索,网上也有大神编译好的文件可以拿来替换。

然后是ssh的安装,因为系统自带有openssh-client,安装一个openssh-server就可以了。

ssh有个免密码的设置,可以省去超多的麻烦,下文的设置只适用于单机:

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

注意第一行中间那是两个单引号!

然后在/etc/profile文件中加入如下语句:

export HADOOP_HOME=/home/shizhida/kit/hadoop-2.2.0
export PATH=$HADOOP_HOME/bin:$PATH

将hadoop的路径加入到环境变量,可以省去超多麻烦有木有

至此安装基本完成,请重启后输入:

$hadoop namenode -format

进行最初的格式化。然后该干啥干啥吧~

原文地址:https://www.cnblogs.com/Ayanami-Blob/p/3675561.html