Creating a Hadoop-2.x project in Eclipse

hortonworks:MapReduce Ports

http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.0/bk_reference/content/reference_chap2_2.html

hadoop-1.x 集群默认配置和常用配置

http://www.cnblogs.com/ggjucheng/archive/2012/04/17/2454590.html

Eclipse下搭建Hadoop-2.x开发环境{good}

http://blog.csdn.net/twlkyao/article/details/17578541

Location name和Host填写localhost，

DFS Master填写HDFS的端口号必须和core-site.xml中的HDFS配置端口一致，这里填写9000，

Map/Reduce Master的端口号必须和Mapred-site.xml中的HDFS配置端口号一致（Hadoop2.x.0版本中没有配置，建议按照 Hadoop1.x配置），这里填写9001；

貌似这一端口号是9000累加的。

User name为Hadoop的所有者用户名，即安装Hadoop的Linux用户，这里为Hduser。

注意：hadoop 2.x 各种端口很复杂，也没有正式的文档。

这里是沿袭 hadoop 1.x的设置, 但幸运的是仍然可以工作。

=======================

December 7, 2013

Running Hadoop-1.2.1 MapReduce App from Eclipse Kepler

http://letsdobigdata.wordpress.com/2013/12/07/running-hadoop-mapreduce-application-from-eclipse-kepler/

===========================

Eclipse上运行hadoop应用总的来说有2种模式，
第一种就是Local模式，也叫本地模式，第二种就是我们正式的线上集群模式。

当运行Local模式的时候，程序并不会被提交到Hadoop集群上，而是基于单机的模式跑的，
但是单机的模式，运行的结果仍在是存储在HDFS上的，只不过没有利用hadoop集群的资源，
单机的模式不提交jar包到hadoop集群上，因此使用local来测试MR程序是否能够正常运行。

1.安装环境
   系统:centos6.4 x64
   hadoop版本:2.2.0
   eclipse版本：kepler

2. 下载eclipse hadoop2.2插件,hadoop2x-eclipse-plugin-master.zip,解压放到eclipse的plugin目录下，重启eclipse.

3. 配置hadoop installation directory。
如果安装插件成功，打开Window-->Preferens，你会发现Hadoop Map/Reduce选项，
在这个选项里你需要配置Hadoop installation directory。配置完成后退出。

4.配置Map/Reduce Locations。在Window-->open persperctive->other...，
在MapReduce Tools中选择Map/Reduce Locations。

   在Map/Reduce Locations（Eclipse界面的正下方）中新建一个Hadoop Location。在这个View中，点击鼠标右键-->New Hadoop Location。
   在弹出的对话框中你需要配置Location name，可任意填，如Hadoop，
   DFS Master填写HDFS的端口号必须和core-site.xml中的HDFS配置端口一致，这里填写9000，
   Map/Reduce Master的端口号必须和Mapred-site.xml中的HDFS配置端口号一致（Hadoop2.x.0版本中没有配置，建议按照 Hadoop1.x配置），这里填写9001；
貌似这一端口号是9000累加的。
User name为Hadoop的所有者用户名，即安装Hadoop的Linux用户，这里为Hduser。

注意：hadoop 2.x 各种端口很复杂，也没有正式的文档。

    这里是沿袭 hadoop 1.x的设置, 但幸运的是仍然可以工作。

5.新建一个hadoop项目测试.
   新建项目:File-->New-->Other-->Map/Reduce Project 项目名可以随便取，如Test001。
   新建测试类，WordCountTest,代码如下:
准备输入： HDFS下创建input目录，并把统计文本put到目录下面。

6.Run

选择run -》java application ，

检查输出：

如果执行成功刷新下hdfs的目录会出现 /output目录结果就在part-r-00000文件

7.到此为止，只是利用Local模式运行并测试了hadoop MR Application,并没有部署到实际集群上。

8.分布式运行:

Eclispe Hadoop插件支持导出可执行jar文件。

例如：本项目,在Eclipse中Export jar, 导出~/Test.jar, 然后提交到shell运行:

$ bin/hadoop jar ~/Test.jar TestPkg.WordCountTest

再例如 hadoop2.x自带的wordcount,提交到shell运行

$ bin/hadoop jar hadoop-mapreduce-examples-2.3.0.jar wordcount /input /output

Q：java.lang.IllegalArgumentException: Wrong FS: hdfs:/ expected file:///

A:需要把集群上的core-site.xml和hdfs-site.xml放到Eclipse当前工程下的bin文件夹下面.

  1 package TestPkg;
  2 
  3 import java.io.IOException;
  4 import java.util.StringTokenizer;
  5 import org.apache.hadoop.conf.Configuration;
  6 import org.apache.hadoop.fs.FileSystem;
  7 import org.apache.hadoop.fs.Path;
  8 import org.apache.hadoop.io.IntWritable;
  9 import org.apache.hadoop.io.Text;
 10 import org.apache.hadoop.mapred.JobConf;
 11 import org.apache.hadoop.mapreduce.Job;
 12 import org.apache.hadoop.mapreduce.Mapper;
 13 import org.apache.hadoop.mapreduce.Reducer;
 14 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
 15 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
 16 import org.apache.hadoop.util.GenericOptionsParser;
 17 
 18 public class WordCountTest {
 19     
 20     
 21     public static class TokenizerMapper extends
 22             Mapper<Object, Text, Text, IntWritable> {
 23 
 24         /**
 25          * LongWritable, IntWritable, Text Hadoop Java WritableComparable
 26          * long,int,String
 27          */
 28         private final static IntWritable one = new IntWritable(1);
 29         private Text word = new Text();// Text BinaryComparablekey
 30 
 31         /**
 32          * Mappermap void map(K1 key, V1 value, OutputCollector<K2,V2> output,
 33          * Reporter reporter) k/vk/v 0 OutputCollectorMapperReducer<k,v>
 34          * OutputCollectorcollect(k, v):(k,v)output
 35          */
 36 
 37         public void map(Object key, Text value, Context context)
 38                 throws IOException, InterruptedException {
 39 
 40             /**
 41              * 
 42              * c++ java hello world java hello you me too mapmapkey 0 c++ java
 43              * hello 16 world java hello 34 you me too
 44              */
 45 
 46             /**
 47              * 
 48              * 
 49              * 
 50              * c++ 1 java 1 hello 1 world 1 java 1 hello 1 you 1 me 1 too 1
 51              * reduce
 52              */
 53             StringTokenizer itr = new StringTokenizer(value.toString());//
 54             System.out.println("value  " + value.toString());
 55             System.out.println("key  " + key.toString());
 56 
 57             while (itr.hasMoreTokens()) {
 58                 word.set(itr.nextToken());
 59 
 60                 context.write(word, one);
 61             }
 62         }
 63     }
 64 
 65     public static class IntSumReducer extends
 66             Reducer<Text, IntWritable, Text, IntWritable> {
 67         private IntWritable result = new IntWritable();
 68 
 69         /**
 70          * reduce (c++ [1]) (java [1,1]) (hello [1,1]) (world [1]) (you [1]) (me
 71          * [1]) (you [1]) reduce
 72          * 
 73          */
 74         public void reduce(Text key, Iterable<IntWritable> values,
 75                 Context context) throws IOException, InterruptedException {
 76             int sum = 0;
 77             /**
 78              * reduce
 79              * 
 80              * c++ 1 hello 2 java 2 me 1 too 1 world 1 you 1
 81              * 
 82              */
 83             for (IntWritable val : values) {
 84                 sum += val.get();
 85             }
 86 
 87             result.set(sum);
 88             context.write(key, result);
 89         }
 90     }
 91 
 92     public static void main(String[] args) throws Exception {
 93 
 94         
 95         args = new String[2];
 96         args[0] = "hdfs://n0:9000/input";
 97         args[1] = "hdfs://n0:9000/output";
 98 
 99         System.out.println("========input,output=============");
100         Configuration conf = new Configuration();
101         String[] otherArgs = new GenericOptionsParser(conf, args)
102                 .getRemainingArgs();
103         
104         for (String s : otherArgs) {
105             System.out.println(s);
106         }
107 
108         // HDFS
109         if (otherArgs.length != 2) {
110             System.err.println("Usage: wordcount <in> <out>");
111             System.exit(2);
112         }
113                     
114         //check output dir: if exist,delete. 
115         FileSystem  fs=FileSystem.get(conf);        
116         Path pout=new Path(otherArgs[1]);  
117         if(fs.exists(pout)){  
118             fs.delete(pout, true);  
119             System.out.printf("output path [%s]exist,delete....
",otherArgs[1]);    
120         }           
121         //
122         
123         // JobConf conf1 = new JobConf(WordCount.class);
124         Job job = new Job(conf, "Word Count Test");
125         
126         job.setJarByClass(WordCountTest.class);
127         
128         job.setMapperClass(TokenizerMapper.class);  
129         job.setCombinerClass(IntSumReducer.class);  
130         job.setReducerClass(IntSumReducer.class);  
131         
132         job.setOutputKeyClass(Text.class);         // key
133         job.setOutputValueClass(IntWritable.class);// value
134         
135         FileInputFormat.addInputPath(job, new Path(otherArgs[0]));   
136         FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); 
137         
138         System.exit(job.waitForCompletion(true) ? 0 : 1);
139     }
140 
141 }