执行hadoop自带的WordCount实例

hadoop 自带的WordCount实例可以统计一批文本文件中各单词出现的次数。
下面介绍如何执行WordCount实例。

1.启动hadoop

[root@hadoop ~]# start-all.sh #启动hadoop

2.在本地新建目录及2个文件

[root@hadoop ~]# mkdir input 
[root@hadoop ~]# cd input/
[root@hadoop input]# echo "hello world">test1.txt #新建2个测试文件
[root@hadoop input]# echo "hello hadoop">test2.txt

3.将本地文件系统上的input目录复制到HDFS根目录下，重命名为in

[root@hadoop ~]# hdfs dfs -put input/ /in
[root@hadoop ~]# hdfs dfs -ls / #查看根目录
Found 1 items
drwxr-xr-x - root supergroup 0 2018-07-20 03:06 /in
[root@hadoop ~]# hdfs dfs -ls /in #查看in根目录
Found 2 items
-rw-r--r-- 1 root supergroup 12 2018-07-20 03:06 /in/test1.txt
-rw-r--r-- 1 root supergroup 13 2018-07-20 03:06 /in/test2.txt

4.执行以下命令

[root@hadoop ~]# cd /usr/local/hadoop/share/hadoop/mapreduce/ #示例jar包在此目录中存放
[root@hadoop mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.7.jar wordcount /in /out #out为输出目录，执行命令之前必须为空或者不存在否则报错

[root@hadoop ~]# cd /usr/local/hadoop/share/hadoop/mapreduce/ #示例jar包在此目录中存放
[root@hadoop mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.7.jar wordcount /in /out
18/07/30 14:02:11 INFO client.RMProxy: Connecting to ResourceManager at hadoop/192.168.42.133:8032
18/07/30 14:02:13 INFO input.FileInputFormat: Total input paths to process : 2
18/07/30 14:02:13 INFO mapreduce.JobSubmitter: number of splits:2
18/07/30 14:02:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1532913019648_0002
18/07/30 14:02:14 INFO impl.YarnClientImpl: Submitted application application_1532913019648_0002
18/07/30 14:02:14 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1532913019648_0002/
18/07/30 14:02:14 INFO mapreduce.Job: Running job: job_1532913019648_0002
18/07/30 14:02:36 INFO mapreduce.Job: Job job_1532913019648_0002 running in uber mode : false
18/07/30 14:02:36 INFO mapreduce.Job:  map 0% reduce 0%
18/07/30 14:04:37 INFO mapreduce.Job:  map 67% reduce 0%
18/07/30 14:04:42 INFO mapreduce.Job:  map 100% reduce 0%
18/07/30 14:05:21 INFO mapreduce.Job:  map 100% reduce 100%
18/07/30 14:05:23 INFO mapreduce.Job: Job job_1532913019648_0002 completed successfully
18/07/30 14:05:26 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=55
        FILE: Number of bytes written=368074
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=217
        HDFS: Number of bytes written=25
        HDFS: Number of read operations=9
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=2
        Launched reduce tasks=1
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=259093
        Total time spent by all reduces in occupied slots (ms)=21736
        Total time spent by all map tasks (ms)=259093
        Total time spent by all reduce tasks (ms)=21736
        Total vcore-milliseconds taken by all map tasks=259093
        Total vcore-milliseconds taken by all reduce tasks=21736
        Total megabyte-milliseconds taken by all map tasks=265311232
        Total megabyte-milliseconds taken by all reduce tasks=22257664
    Map-Reduce Framework
        Map input records=2
        Map output records=4
        Map output bytes=41
        Map output materialized bytes=61
        Input split bytes=192
        Combine input records=4
        Combine output records=4
        Reduce input groups=3
        Reduce shuffle bytes=61
        Reduce input records=4
        Reduce output records=3
        Spilled Records=8
        Shuffled Maps =2
        Failed Shuffles=0
        Merged Map outputs=2
        GC time elapsed (ms)=847
        CPU time spent (ms)=4390
        Physical memory (bytes) snapshot=461631488
        Virtual memory (bytes) snapshot=6226669568
        Total committed heap usage (bytes)=277356544
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=25
    File Output Format Counters 
        Bytes Written=25

执行命令时显示MapReduce过程

5.查看输出结果

1)直接查看HDFS上的输出文件

[root@hadoop mapreduce]# hdfs dfs -ls /out
Found 2 items
-rw-r--r--   1 root supergroup          0 2018-07-30 14:05 /out/_SUCCESS
-rw-r--r--   1 root supergroup         25 2018-07-30 14:05 /out/part-r-00000
[root@hadoop mapreduce]# hdfs dfs -cat /out/part-r-00000
hadoop    1
hello    2
world    1

2)也可以输入以下命令查看

[root@hadoop mapreduce]# hdfs dfs -cat /out/*
hadoop    1
hello    2
world    1

3)还可以把文件复制到本地查看

[root@hadoop mapreduce]# hdfs dfs -get /out /root/output
[root@hadoop mapreduce]# cd  /root/output/
[root@hadoop output]# ll
总用量 4
-rw-r--r-- 1 root root 25 7月  30 17:18 part-r-00000
-rw-r--r-- 1 root root  0 7月  30 17:18 _SUCCESS
[root@hadoop output]# cat part-r-00000 
hadoop    1
hello    2
world    1