linux安装Hadoop

安装Hadoop:

1,下载hadoop-0.20.203.0.tar.gz

  用命令tar -zxf hadoop-0.20.203.0.tar.gz

  此时会出现hadoop-0.20.203.0文件夹

2,vi /etc/profile或修改conf/hadoop-env.sh

  在里面添加如下代码:

  #This is Hadoop

  #HADOOP_INSTALL is an environment variable that points to the Hadoop installation directory

   export HADOOP_HOME=/usr/hadoop-0.20.203.0

   export PATH=$PATH:$HADOOP_HOME/bin

  (同时也需配置JAVA_HOME, 必须把JAVA_HOME配置到hadoop-env.sh中)

3,重启或执行source /etc/profile

配置Hadoop:

(1)单机模式:

  测试1:

  #mkdir input

  #cd input

  #echo "hello world" > test1.txt

  #echo "hello hadoop" > test2.txt

  #cd ..

  #hadoop jar /usr/hadoop-0.20.203.0/hadoop-examples-0.20.203.0.jar wordcount input output

  查看结果:

  #cat output/*

  测试2:

  #mkdir input2

  #cp /usr/hadoop-0.20.203.0/conf/*.xml input2

  #hadoop jar /usr/hadoop-0.20.203.0/hadoop-examples-0.20.203.0.jar grep input2 output2 ‘dfs[a-z.]+’

  查看结果:

  #cat output2/*

(2)伪分布式模式:

  免密码SSH设置:

  额外知识:(1,/etc/ssh 2,当执行过ssh localhost//连本机,就会在自己的主文件夹中生成~/.ssh/known_hosts)

  1, 生成密钥对:

  #ssh-keygen -t rsa

  //执行后,在~/.ssh文件夹中会生成两个文件id_rsa和id_rsa.pub

  2, cp id_rsa.pub authorized_keys

  此后执行#ssh localhost,可以实现用SSH连接不需要输入密码。

  conf中的配置:

    core-site.xml和hdfs-site.xml是站在HDFS角度上的配置文件;core-site.xml和mapred-site.xml是站在MapReduce角度上的配置文件。

   core-site.xml文档内容:

    <configuration>

           <property>

                  <name>fs.default.name</name>

                  <value>hdfs://localhost:9000</value>

           </property>

    </configuration>

   hdfs-site.xml文档内容:

    <configuration>

           <property>

                  <name>dfs.replication</name>

                  <value>1</value>

           </property>

    </configuration>

   mapred-site.xml文档内容:

    <configuration>

           <property>

                  <name>mapred.job.tracker</name>

                  <value>localhost:9001</value>

           </property>

    </configuration>

运行Hadoop:

格式化分布式系统:hadoop namenode –format

启动hadoop守护进程: start-all.sh

可用jps查看守护进程是否启动,Java Print Service(JPS)是一个旨在所有Java平台上实现打印功能的API

运行wordcount实例:

  #hadoop jar /usr/hadoop-1.0.1/hadoop-examples-1.0.1.jar wordcount input output

  #cat output/*

或 #hadoop fs –copyFromLocal input in

  #hadoop jar /usr/hadoop-1.0.1/hadoop-examples-1.0.1.jar wordcount in out

  #hadoop fs –cat out/*

停止守护进程:stop-all.sh

原文地址:https://www.cnblogs.com/liangzh/p/2434543.html