hadoop权威指南第二章练习

多亏这篇博客给的指导

数据:使用的文章给出的数据,保存在/home/amelie/Haddoop_file/ncdc.txt

环境:ubuntu16.04LTS,hadoop伪分布式

  1. 使用eclipse,新建一个Map/Reduce project(我感觉普通的project也可以),名字为Chap2
  2. 在目录src目录下新建package,名为chap2_Package
  3. 在这个package包下分别建立三个class,分别为MaxTemperatureMapper,MaxTemperatureReducer,MaxTemperature。
  • MaxTemperatureMapper.java
package chap2_Package;

import java.io.IOException; 

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MaxTemperatureMapper extends Mapper<LongWritable,Text,Text,IntWritable>{
    private static final int MISSING=9999;
    public void map(LongWritable key,Text value,Context context) throws IOException,InterruptedException{
        String line= value.toString();
        String year=line.substring(5,9);
        int airTemperature=Integer.parseInt(line.substring(15,19));
        if(airTemperature!=MISSING){
            context.write(new Text(year),new IntWritable(airTemperature));
        }
        
    }
}
  • MaxTemperatureReducer.java
package chap2_Package;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MaxTemperatureReducer extends Reducer<Text,IntWritable,Text,IntWritable>{
    @Override
    public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException,InterruptedException{
        int maxValue=Integer.MIN_VALUE;
        for(IntWritable value:values){
            maxValue=Math.max(maxValue, value.get());
        }
        context.write(key, new IntWritable(maxValue));
    }
}
  • MaxTemperature.java
package chap2_Package;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MaxTemperature {
    public static void main(String[] args) throws Exception {
        if(args.length!=2){
            System.err.println("Usage: MaxTemperature <input path> <output path>");
            System.exit(-1);
        }
        Job job=new Job();
        job.setJarByClass(MaxTemperature.class);
        job.setJobName("Max temperature");
        
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        
        job.setMapperClass(MaxTemperatureMapper.class);
        job.setReducerClass(MaxTemperatureReducer.class);
        
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
    
}

    可能还会有什么问题,修修改改的就好啦

      4.将这三个文件所在的package,点击export

  • 选择为jar file
  • next:文件位置:/home/amelie/Hadoop_file/,写入文件名称,不限定,此处为:HadoopTest.jar
  • next:接下来的设置默认
  • next:需要选择main class,点击页面的Browse..即可选择,如此处填入:Chap2_package.MaxTemperature

jar包形成,接下来进行操作

#cd /usr/local/hadoop/hadoop-2.5.2  //进入hadoop目录

打开hadoop后,进行下面操作

#hadoop fs -put /home/amelie/Hadoop_file/ncdc.txt /data/input  //将数据文件放入hadoop目录下的/data/input

经常会出现提示,找不到hadoop命令,此时,用命令更新profile就好:source /etc/profile

#export HADOOP_CLASSPATH=/home/amelie/Hadoop_file   //这个地址是放jar文件的地址,不知道这句话什么用,但还是运行了

执行jar命令

#hadoop jar /home/amelie/Hadoop_file/HadoopTest.jar /data/input/ncdc_data.txt /data/output3

因为打包时已经制定了mainclass,所以,没有写class name

若未指定,运行hadoop jar 命令时在***.jar后面需要指定包括包路径的mainclass类名

例如 hadoop jar /home/amelie/Hadoop_file/HadoopTest.jar Chap2_package.MaxTemperature /data/input/ncdc_data.txt /data/output3

运行结果:

查看输出:

#hadoop fs -ls /data/output3
#hadoop fs -cat /data/output3/part-r-00000   

完成!!

原文地址:https://www.cnblogs.com/amelie-tingting/p/6489941.html