HDFS基本命令与Hadoop MapReduce程序的执行

  一、HDFS基本命令

  1.创建目录:-mkdir

[jun@master ~]$ hadoop fs -mkdir /test
[jun@master ~]$ hadoop fs -mkdir /test/input

  2.查看文件列表:-ls

[jun@master ~]$ hadoop fs -ls /
Found 1 items
drwxr-xr-x   - jun supergroup          0 2018-07-22 10:31 /test
[jun@master ~]$ hadoop fs -ls /test
Found 1 items
drwxr-xr-x   - jun supergroup          0 2018-07-22 10:31 /test/input

  3.上传文件到HDFS

  在/home/jun下新建两个文件jun.dat和jun.txt

  (1)使用-put将文件从本地复制到HDFS集群

[jun@master ~]$ hadoop fs -put /home/jun/jun.dat /test/input/jun.dat

  (2)使用-copyFromLocal将文件从本地复制到HDFS集群

[jun@master ~]$ hadoop fs -copyFromLocal -f /home/jun/jun.txt  /test/input/jun.txt

  (3)查看是否复制成功

[jun@master ~]$ hadoop fs -ls /test/input
Found 2 items
-rw-r--r--   1 jun supergroup         22 2018-07-22 10:38 /test/input/jun.dat
-rw-r--r--   1 jun supergroup         22 2018-07-22 10:39 /test/input/jun.txt

  4.下载文件到本地

  (1)使用-get将文件从HDFS集群复制到本地

[jun@master ~]$ hadoop fs -get /test/input/jun.dat /home/jun/jun1.dat

  (2)使用-copyToLocal将文件从HDFS集群复制到本地

[jun@master ~]$ hadoop fs -copyToLocal /test/input/jun.txt /home/jun/jun1.txt

  (3)查看是否复制成功

[jun@master ~]$ ls -l /home/jun/
total 16
drwxr-xr-x.  2 jun jun   6 Jul 19 15:14 Desktop
drwxr-xr-x.  2 jun jun   6 Jul 19 15:14 Documents
drwxr-xr-x.  2 jun jun   6 Jul 19 15:14 Downloads
drwxr-xr-x. 10 jun jun 161 Jul 21 19:25 hadoop
drwxrwxr-x.  3 jun jun  17 Jul 20 20:07 hadoopdata
-rw-r--r--.  1 jun jun  22 Jul 22 10:43 jun1.dat
-rw-r--r--.  1 jun jun  22 Jul 22 10:44 jun1.txt
-rw-rw-r--.  1 jun jun  22 Jul 22 10:35 jun.dat
-rw-rw-r--.  1 jun jun  22 Jul 22 10:35 jun.txt
drwxr-xr-x.  2 jun jun   6 Jul 19 15:14 Music
drwxr-xr-x.  2 jun jun   6 Jul 19 15:14 Pictures
drwxr-xr-x.  2 jun jun   6 Jul 19 15:14 Public
drwxr-xr-x.  2 jun jun   6 Jul 20 16:43 Resources
drwxr-xr-x.  2 jun jun   6 Jul 19 15:14 Templates
drwxr-xr-x.  2 jun jun   6 Jul 19 15:14 Videos

  5.查看HDFS集群中的文件

[jun@master ~]$ hadoop fs -cat /test/input/jun.txt
This is the txt file.
[jun@master ~]$ hadoop fs -text /test/input/jun.txt
This is the txt file.
[jun@master ~]$ hadoop fs -tail /test/input/jun.txt
This is the txt file.

  6.删除HDFS文件

[jun@master ~]$ hadoop fs -rm /test/input/jun.txt
Deleted /test/input/jun.txt
[jun@master ~]$ hadoop fs -ls /test/input
Found 1 items
-rw-r--r--   1 jun supergroup         22 2018-07-22 10:38 /test/input/jun.dat

  7.也可以在slave节点上执行命令

[jun@slave0 ~]$ hadoop fs -ls /test/input
Found 1 items
-rw-r--r--   1 jun supergroup         22 2018-07-22 10:38 /test/input/jun.dat

  二、在Hadoop集群中运行程序

  Hadoop安装文件中有一个MapReduce示例程序,该程序用来计算圆周率pi的Java程序包,

  参数说明:pi(类名)、10(Map次数)、10(随机生成点的次数)

[jun@master ~]$ hadoop jar /home/jun/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.4.jar pi 10 10
Number of Maps  = 10
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
18/07/22 10:55:07 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.1.100:18040
18/07/22 10:55:08 INFO input.FileInputFormat: Total input files to process : 10
18/07/22 10:55:08 INFO mapreduce.JobSubmitter: number of splits:10
18/07/22 10:55:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1532226440522_0001
18/07/22 10:55:10 INFO impl.YarnClientImpl: Submitted application application_1532226440522_0001
18/07/22 10:55:10 INFO mapreduce.Job: The url to track the job: http://master:18088/proxy/application_1532226440522_0001/
18/07/22 10:55:10 INFO mapreduce.Job: Running job: job_1532226440522_0001
18/07/22 10:55:20 INFO mapreduce.Job: Job job_1532226440522_0001 running in uber mode : false
18/07/22 10:55:20 INFO mapreduce.Job:  map 0% reduce 0%
18/07/22 10:56:21 INFO mapreduce.Job:  map 10% reduce 0%
18/07/22 10:56:22 INFO mapreduce.Job:  map 40% reduce 0%
18/07/22 10:56:23 INFO mapreduce.Job:  map 50% reduce 0%
18/07/22 10:56:33 INFO mapreduce.Job:  map 100% reduce 0%
18/07/22 10:56:34 INFO mapreduce.Job:  map 100% reduce 100%
18/07/22 10:56:36 INFO mapreduce.Job: Job job_1532226440522_0001 completed successfully
18/07/22 10:56:36 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=226
        FILE: Number of bytes written=1738836
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=2590
        HDFS: Number of bytes written=215
        HDFS: Number of read operations=43
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=3
    Job Counters 
        Launched map tasks=10
        Launched reduce tasks=1
        Data-local map tasks=10
        Total time spent by all maps in occupied slots (ms)=635509
        Total time spent by all reduces in occupied slots (ms)=10427
        Total time spent by all map tasks (ms)=635509
        Total time spent by all reduce tasks (ms)=10427
        Total vcore-milliseconds taken by all map tasks=635509
        Total vcore-milliseconds taken by all reduce tasks=10427
        Total megabyte-milliseconds taken by all map tasks=650761216
        Total megabyte-milliseconds taken by all reduce tasks=10677248
    Map-Reduce Framework
        Map input records=10
        Map output records=20
        Map output bytes=180
        Map output materialized bytes=280
        Input split bytes=1410
        Combine input records=0
        Combine output records=0
        Reduce input groups=2
        Reduce shuffle bytes=280
        Reduce input records=20
        Reduce output records=0
        Spilled Records=40
        Shuffled Maps =10
        Failed Shuffles=0
        Merged Map outputs=10
        GC time elapsed (ms)=59206
        CPU time spent (ms)=54080
        Physical memory (bytes) snapshot=2953310208
        Virtual memory (bytes) snapshot=23216238592
        Total committed heap usage (bytes)=2048393216
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=1180
    File Output Format Counters 
        Bytes Written=97
Job Finished in 88.689 seconds
Estimated value of Pi is 3.20000000000000000000

  最后可以看到,得到的结果近似为3.2。

原文地址:https://www.cnblogs.com/BigJunOba/p/9349402.html