《Hadoop实战》之链接多个MapReduce作业

顺序链接MapReduce作业

形如:mapreduce-1 | mapreduce-2 | mapreduce-3

  • 在run函数中,继续写新的job,再通过JobClient.runJob()进行调用
@Override
public int run(String[] args) throws Exception {
	JobConf job1 = new JobConf(getConf(), getClass());
	JobClient.runJob(job1);
	
	JobConf job2 = new JobConf(getConf(), getClass());
	JobClient.runJob(job2);
}

具有复杂依赖的MapReduce链接

  • 通过Job和JobControl类来管理
// 对于Job对象x和y
x.addDependingJob(y)	// 添加依赖关系:在y完成之前,x不会启动

jobControl.addJob(x)	// Job对象x,y 由JobControl对象管理
jobControl.addJob(y)	


jobControl.allFinished()	//JobControl对象的监视方法
jobControl.getFailedJobs()

预处理和后处理的链接

形如:Map+ | REDUCE | MAP*

  • ChainMapper/ChainReducer:减少输出的中间结果

  • addMapper/setReducer接口

    • job、mapperConf:全局和本地JobConf对象
    • kclass:Mapper类
    • 输入输出类的类型
    • byValue:MapOutputKey跟MapOutputValue是否采用值传递的方式
      • true:值传递
      • false:引用传递
public static <K1, V1, K2, V2> void 
						addMapper(JobConf job,
								  Class<? extends Mapper<K1, V1, K2, V2>> kclass,
								  Class<? extends K1> inputKeyClass,
								  Class<? extends V1> inputValueClass,
								  Class<? extends K2> outputKeyClass,
								  Class<? extends V2> outputValueClass,
								  boolean byValue,
								  JobConf mapperConf)
例:具有预处理和后处理的MapReduce Driver
  • Map1 | Map2 | Reduce | Map3 | Map4
    • ChainMapper.addMapper:添加Reduce前所有步骤
    • ChainReducer.addMapper:后续步骤
    • 本地JobConf对象具有更高优先级
    @Override
    public int run(String[] args) throws Exception {
        JobConf job = new JobConf(getConf(), getClass());

        job.setJobName("ChainJob");
        job.setInputFormat(TextInputFormat.class);
        job.setOutputFormat(TextOutputFormat.class);

        JobConf map1Conf = new JobConf(false);  // loadDefaults=false,生成本地配置对象
        ChainMapper.addMapper(job, Map1.class, LongWritable.class, Text.class,
                Text.class, Text.class, true, map1Conf);
        JobConf map2Conf = new JobConf(false);
        ChainMapper.addMapper(job, Map2.class, Text.class, Text.class,
                LongWritable.class, Text.class, true, map2Conf);

        JobConf reduceConf = new JobConf(false);    
        ChainReducer.setReducer(job, ReducerClass.class, LongWritable.class, Text.class,
                Text.class, Text.class, true, reduceConf);

        JobConf map3Conf = new JobConf(false);
        ChainReducer.addMapper(job, Map3.class, Text.class, Text.class,
                LongWritable.class, Text.class, true, map3Conf);
        JobConf map4Conf = new JobConf(false);
        ChainReducer.addMapper(job, Map4.class, LongWritable.class, Text.class,
                LongWritable.class, Text.class, true, map4Conf);
        
        JobClient.runJob(job);
        return 0;
    }
原文地址:https://www.cnblogs.com/vvlj/p/14101858.html