Hadoop：部署Hadoop Single Node

一、环境准备

1、系统环境

CentOS 7

2、软件环境

OpenJDK

# 查询可安装的OpenJDK软件包
[root@server1] yum search java | grep jdk
...
# 选择1.8.0版本安装，包括运行环境（openjdk）和开发环境（openjdk-devel）
[root@server1] yum install -y java-1.8.0-openjdk.x86_64 java-1.8.0-openjdk-devel.x86_64

[root@server1] yum install -y ssh

Hadoop

在mirror.bit.edu.cn/apache/hadoop/common/上下载合适的Hadoop版本，这里选择hadoop-2.7.3.tar.gz

二、配置Hadoop

1、解压缩hadoop-2.7.3.tar.gz

2、配置JAVA_HOME

[root@server1 hadoop]# vim etc/hadoop/hadoop-env.sh
# set to the root of your Java installation
  export JAVA_HOME=/usr # 这里一定要注意，是去掉/bin/java的目录

3、配置系统环境变量

[root@server1 hadoop]# vim /etc/profile
...
export HADOOP_PREFIX=/usr/local/hadoop
export PATH=$PATH:$HADOOP/bin
...
[root@server1 hadoop]# source /etc/profile

三、测试Hadoop

[root@server1 hadoop]# ./bin/hadoop
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
  CLASSNAME            run the class named CLASSNAME
 or
  where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
                       note: please use "yarn jar" to launch
                             YARN applications, not this command.
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
  credential           interact with credential providers
                       Hadoop jar and the required libraries
  daemonlog            get/set the log level for each daemon
  trace                view and modify Hadoop tracing settings

Most commands print help when invoked w/o parameters.

四、运行Hadoop

因为这里只有一台服务器，因此采用Standalone模式运行，执行一个任务

[root@server1 hadoop]# mkdir input
[root@server1 hadoop]# cp etc/hadoop/*.xml input
[root@server1 hadoop]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'
...
16/09/01 16:05:25 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=1248142
                FILE: Number of bytes written=2318080
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
        Map-Reduce Framework
                Map input records=1
                Map output records=1
                Map output bytes=17
                Map output materialized bytes=25
                Input split bytes=121
                Combine input records=0
                Combine output records=0
                Reduce input groups=1
                Reduce shuffle bytes=25
                Reduce input records=1
                Reduce output records=1
                Spilled Records=2
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=24
                Total committed heap usage (bytes)=262553600
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=123
        File Output Format Counters
                Bytes Written=23
...

[root@server1 hadoop]# cat output/*
1       dfsadmin

五、遇到的问题

1、找不到java命令

export JAVA_HOME=/usr，这个hadoop环境变量一定要设置为父目录

2、metrics.MetricsUtil: Unable to obtain hostName

[root@server1 hadoop]# vim /etc/hosts
127.0.0.1    server1