hadoop2.5.1搭建(二）

第一篇主要是整体的步骤，其实中间遇到很多问题，第二篇将遇到的问题全部列举下来：

1.1包不能加载警告

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

hadoop2.5.1官网上提供的已经是64位操作系统版本，但是仍然报这个错误

1.1.1测试本地库

[root@cluster3 ~]# export HADOOP_ROOT_LOGGER=DEBUG,console
[root@cluster3 script]# hadoop fs -text /usr/local/script/hdfile1.txt
14/11/01 10:58:15 DEBUG util.NativeCodeLoader: Failed to load native-hadoop with error: 
    java.lang.UnsatisfiedLinkError: /usr/local/hadoop/hadoop-2.5.1/lib/native/libhadoop.so.1.0.0: /lib64/libc.so.6:
    version `GLIBC_2.12' not found (required by /usr/local/hadoop/hadoop-2.5.1/lib/native/libhadoop.so.1.0.0)
14/11/01 10:58:15 DEBUG util.NativeCodeLoader: java.library.path=/usr/local/hadoop/hadoop-2.5.1/lib/native
14/11/01 10:58:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/11/01 10:58:15 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Falling back to shell based

[root@cluster1 lib64]# ll /lib64/libc.so.6 
lrwxrwxrwx 1 root root 11 Oct 31 17:27 /lib64/libc.so.6 -> libc-2.5.so

可以看到上边要求的是glibc_2.12，所以需要升级glibc（对hadoop重新编译即可，不需要升级glibc）

编译hadoop源码

2、配置本地yum源

修改yum的配置文件，使用本地ISO做yum源

创建目录
mkdir /mnt/cdrom
mount /dev/cdrom /mnt/cdrom

复制到本地
cp -avf /mnt/cdrom /yum

创建文件：
vi /etc/yum.repos.d/CentOS-Local.repo

[Local]  
name=Local Yum  
baseurl=file:///yum/  
gpgcheck=1  
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7  
enabled=1

# cd /etc/yum.repos.d/

# mv CentOS-Base.repo CentOS-Base.repo.bak     禁用默认的yum 网络源

# cp CentOS-Media.repo CentOS-Media.repo.bak     是yum 本地源的配置文件        

修改配置文件
# vi  CentOS-Media.repo
baseurl=file:///media/CentOS_6.3_Final/
enabled=1                   #启用yum

[root@cluster3 yum.repos.d]# yum -y install gcc

3、clone虚拟机后，修改主机名

修改主机名
修改/etc/sysconfig/network中的hostname为【修改后的主机名】
修改/etc/hosts文件中的 【原来主机名】为【修改后的主机名】
reboot，重启系统。
查看hostname ，是否修改成功

4测试程序

[root@cluster3 input]# hadoop dfs -mkdir /hadoop
[root@cluster3 input]# hadoop dfs -mkdir /hadoop/input

[root@cluster3 hadoop-2.5.1]# hadoop dfs -put /usr/local/hadoop/hadoop-2.5.1/test/text1.txt /hadoop/input
[root@cluster3 hadoop-2.5.1]# hadoop dfs -put /usr/local/hadoop/hadoop-2.5.1/test/text2.txt /hadoop/input
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

[root@cluster3 hadoop-2.5.1]# hadoop jar /usr/local/hadoop/hadoop-2.5.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar wordcount /hadoop/input/* /hadoop/output
14/11/06 15:44:51 INFO client.RMProxy: Connecting to ResourceManager at cluster3/192.168.220.63:8032
14/11/06 15:44:52 INFO input.FileInputFormat: Total input paths to process : 2
14/11/06 15:44:52 INFO mapreduce.JobSubmitter: number of splits:2
14/11/06 15:44:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1415259711375_0001
14/11/06 15:44:53 INFO impl.YarnClientImpl: Submitted application application_1415259711375_0001
14/11/06 15:44:53 INFO mapreduce.Job: The url to track the job: http://cluster3:8088/proxy/application_1415259711375_0001/
14/11/06 15:44:53 INFO mapreduce.Job: Running job: job_1415259711375_0001
14/11/06 15:45:04 INFO mapreduce.Job: Job job_1415259711375_0001 running in uber mode : false
14/11/06 15:45:04 INFO mapreduce.Job:  map 0% reduce 0%
14/11/06 15:45:57 INFO mapreduce.Job:  map 100% reduce 0%
14/11/06 15:46:17 INFO mapreduce.Job:  map 100% reduce 100%
14/11/06 15:46:18 INFO mapreduce.Job: Job job_1415259711375_0001 completed successfully
14/11/06 15:46:18 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=55
                FILE: Number of bytes written=291499
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=241
                HDFS: Number of bytes written=25
                HDFS: Number of read operations=9
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=2
                Launched reduce tasks=1
                Data-local map tasks=2
                Total time spent by all maps in occupied slots (ms)=106968
                Total time spent by all reduces in occupied slots (ms)=9679
                Total time spent by all map tasks (ms)=106968
                Total time spent by all reduce tasks (ms)=9679
                Total vcore-seconds taken by all map tasks=106968
                Total vcore-seconds taken by all reduce tasks=9679
                Total megabyte-seconds taken by all map tasks=109535232
                Total megabyte-seconds taken by all reduce tasks=9911296
        Map-Reduce Framework
                Map input records=2
                Map output records=4
                Map output bytes=41
                Map output materialized bytes=61
                Input split bytes=216
                Combine input records=4
                Combine output records=4
                Reduce input groups=3
                Reduce shuffle bytes=61
                Reduce input records=4
                Reduce output records=3
                Spilled Records=8
                Shuffled Maps =2
                Failed Shuffles=0
                Merged Map outputs=2
                GC time elapsed (ms)=1085
                CPU time spent (ms)=3400
                Physical memory (bytes) snapshot=502984704
                Virtual memory (bytes) snapshot=2204106752
                Total committed heap usage (bytes)=257171456
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=25
        File Output Format Counters 
                Bytes Written=25
                
[root@cluster3 hadoop-2.5.1]# hadoop dfs -ls /hadoop/
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Found 2 items
drwxr-xr-x   - root supergroup          0 2014-11-06 15:44 /hadoop/input
drwxr-xr-x   - root supergroup          0 2014-11-06 15:46 /hadoop/output

[root@cluster3 hadoop-2.5.1]# hadoop dfs -cat /hadoop/output/part-r-00000
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

hadoop  1
hello   2
world   1

5.连接失败

[root@cluster3 hadoop-2.5.1]# hadoop jar /usr/local/hadoop/hadoop-2.5.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar wordcount /hadoop/input/* /hadoop/output
14/11/06 11:28:15 INFO client.RMProxy: Connecting to ResourceManager at cluster3/192.168.220.63:8032
java.net.ConnectException: Call From cluster3/192.168.220.63 to cluster3:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
解决办法：
namenode未启动

6.没有datanode

14/11/06 09:39:10 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /hadoop/input/text1.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1471)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2791)
解决办法：
由于执行了多次hdfs namenode -format 需要手动清除下name和data数据

7.数据丢失危险

2014-11-06 10:20:14,903 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Only one image storage directory (dfs.namenode.name.dir) configured. Beware of data loss due to lack of redundant storage directories!
2014-11-06 10:20:14,903 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Only one namespace edits storage directory (dfs.namenode.edits.dir) configured. Beware of data loss due to lack of redundant storage directories!
通过在dfs.namenode.name.dir和dfs.datanode.data.dir设置多个挂载在不同物理硬盘或者NFS挂载的目录即可

8.http://192.168.220.63:50070访问不了,NodeManager启动一下，过一会就没了。

关闭防火墙服务
[root@cluster3 hadoop]# service iptables stop
关闭开机自动启动
[root@cluster3 hadoop]# chkconfig iptables off