[b0013] Hadoop 版hello word mapreduce wordcount 运行(三)

目的：

不用任何IDE，直接在linux 下输入代码、调试执行

环境：

Linux Ubuntu

Hadoop 2.6.4

最佳:

简单方式：

在当前目录创建类文件，添加后面的内容，但是不包括第一行package

编译：

javac WordCount.java

打包

jar -cvf WordCount.jar ./WordCount*.class

执行

hadoop jar WordCount.jar WordCount /input /output

这种方式不能加package,如果加了package，即使最后一步这样也测试不通过 hadoop jar WordCount.jar 包路径.WordCount /input /output
加package的只能用本文后面的方法

1、准备程序

linux 新建工程文件夹

word, word/src, word/classes

在src下新建类文件 WordCount.java，添加如下代码，注意第一行的包名，后面用到

 1 package hadoop.mapr;
 2 
 3 import java.io.IOException;
 4 import java.util.*;
 5 
 6 import org.apache.hadoop.fs.Path;
 7 import org.apache.hadoop.conf.*;
 8 import org.apache.hadoop.io.*;
 9 import org.apache.hadoop.mapreduce.*;
10 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
11 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
12 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
13 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
14 
15 /**
16  * 描述：WordCount explains by xxm
17  * @author xxm
18  */
19 public class WordCount {
20 
21  /**
22  * Map类：自己定义map方法
23  */
24  public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
25     /**
26     * LongWritable, IntWritable, Text 均是 Hadoop 中实现的用于封装 Java 数据类型的类
27     * 都能够被串行化从而便于在分布式环境中进行数据交换，可以将它们分别视为long,int,String 的替代品。
28     */
29     private final static IntWritable one = new IntWritable(1);
30     private Text word = new Text();
31     /**
32     * Mapper类中的map方法：
33     * protected void map(KEYIN key, VALUEIN value, Context context)
34     * 映射一个单个的输入k/v对到一个中间的k/v对
35     * Context类：收集Mapper输出的<k,v>对。
36     */
37     public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
38         String line = value.toString();
39         StringTokenizer tokenizer = new StringTokenizer(line);
40         while (tokenizer.hasMoreTokens()) {
41             word.set(tokenizer.nextToken());
42             context.write(word, one);
43         }
44     }
45  } 
46 
47  /**
48  * Reduce类：自己定义reduce方法
49  */       
50  public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
51 
52     /**
53     * Reducer类中的reduce方法：
54     * protected void reduce(KEYIN key, Interable<VALUEIN> value, Context context)
55     * 映射一个单个的输入k/v对到一个中间的k/v对
56     * Context类：收集Reducer输出的<k,v>对。
57     */
58     public void reduce(Text key, Iterable<IntWritable> values, Context context) 
59       throws IOException, InterruptedException {
60         int sum = 0;
61         for (IntWritable val : values) {
62             sum += val.get();
63         }
64         context.write(key, new IntWritable(sum));
65     }
66  }
67 
68  /**
69  * main主函数
70  */       
71  public static void main(String[] args) throws Exception {
72 
73     Configuration conf = new Configuration();//创建一个配置对象，用来实现所有配置
74 //    conf.set("fs.defaultFS", "hdfs://ssmaster:9000/");
75     
76     Job job = new Job(conf, "wordcount");//新建一个job，并定义名称
77 
78     job.setOutputKeyClass(Text.class);//为job的输出数据设置Key类
79     job.setOutputValueClass(IntWritable.class);//为job输出设置value类
80     
81     job.setMapperClass(Map.class); //为job设置Mapper类
82     job.setReducerClass(Reduce.class);//为job设置Reduce类
83     job.setJarByClass(WordCount.class);
84 
85     job.setInputFormatClass(TextInputFormat.class);//为map-reduce任务设置InputFormat实现类
86     job.setOutputFormatClass(TextOutputFormat.class);//为map-reduce任务设置OutputFormat实现类
87 
88     FileInputFormat.addInputPath(job, new Path(args[0]));//为map-reduce job设置输入路径
89     FileOutputFormat.setOutputPath(job, new Path(args[1]));//为map-reduce job设置输出路径
90     job.waitForCompletion(true); //运行一个job，并等待其结束
91  }
92 
93 }

View Code

2 编译、打包 Hadoop MapReduce 程序

2.1 我们将 Hadoop 的 classhpath 信息添加到 CLASSPATH 变量中，在 /etc/profile 中增加 hadoop classpath的类包，source /etc/profile 生效

export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH

2.2 切换到word目录，执行命令编译

javac -d classes src/*.java

-classpath，设置源代码里使用的各种类库所在的路径，多个路径用":"隔开。
-d，设置编译后的 class 文件保存的路径。
src/*.java，待编译的源文件。

备注：如果没有配置hadoop classpath路径,执行方式 javac -classpath 依赖hadoop包.jar -d classes src/*.java

执行结果：在classes文件夹创建 hadoop/mapr，这是类的包名，产生的类有

hadoop@ssmaster:~/java_program/word$ ls classes/hadoop/mapr/
WordCount.class  WordCount$Map.class  WordCount$Reduce.class

2.3 将类文件夹classes打包到word目录

jar -cvf WordCount.jar classes

hadoop@ssmaster:~/java_program/word$ ls
classes  src  WordCount.jar

3 执行

启动hadoop,准备/input，确保没有/output

执行命令，由于类中有包名，这里要加上

hadoop jar WordCount.jar hadoop.mapr.WordCount /input /output

会启动成功。但是我的这里有什么异常，导致Hadoop集群退出 [遗留：运维重大问题]

总结：

hadoop mapreduce,hdfs的开发环境基本了解差不多

后续：

重点学习hdfs，mapreduce的任务编程

参考：

1 第一个MapReduce程序——WordCount

2 使用命令行编译打包运行自己的MapReduce程序 Hadoop2.6.0

[b0013] Hadoop 版hello word mapreduce wordcount 运行(三)

目的：

环境：

相关:

最佳:

1、准备程序

2 编译、打包 Hadoop MapReduce 程序

3 执行

总结：

参考：