IDEA 提交 MapReduce 到 Hadoop 集群的 Yarn 上运行

文章目录
1. 搭建环境
2. 新建WordCount V1.0
3. 坑
1. 搭建环境
搭建 Hadoop集群环境 Hadoop 3.1.2 独立模式,单节点和多节点伪分布式安装与使用

新建环境变量,设置hadoop的用户名,为集群的用户名


2. 新建WordCount V1.0
添加Maven依赖,虽然hadoop-client中有hadoop-mapreduce-client-jobclient,但不单独添加,IDEA控制台日志不会打印

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.1.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-jobclient</artifactId>
<version>3.1.2</version>
</dependency>
 
添加log4j.properties到resource文件夹中

log4j.rootLogger=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.Target=System.out
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=[%p] %d{yyyy-MM-dd HH:mm:ss,SSS} method:%l%m%n
 
将Hadoop集群环境中的core-site.xml,hdfs-site.xml,mapred-site.xml,yarn-site.xml添加到resource文件夹中

map

public class WordCountMapper1 extends Mapper<LongWritable, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
// 读取一行
String line = value.toString();
// 空格分隔
StringTokenizer stringTokenizer = new StringTokenizer(line);
// 循环空格分隔,给每个计数1
while(stringTokenizer.hasMoreTokens()){
word.set(stringTokenizer.nextToken());
context.write(word, one);
}
}
}
 
reduce

public class WordCountReducer1 extends Reducer<Text, IntWritable, Text, IntWritable> {

private IntWritable result = new IntWritable();

@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
// 根据key对values计数
int sum = 0;
for(IntWritable intWritable : values){
sum += intWritable.get();
}
result.set(sum);
context.write(key, result);
}

WordCount V1.0,需要添加设置用户可以跨平台提交和需要执行jar的路径,即Maven的Package命令生成的该jar的路径

public class WordCount1 {

public static void main( String[] args ) {
// 读取hdfs-site.xml,core-site.xml
Configuration conf = new Configuration();
// 设置用户可以跨平台提交,否则提交成功但是执行失败
conf.set("mapreduce.app-submission.cross-platform","true");
try{
Job job = Job.getInstance(conf,"WordCount V1.0");

job.setJarByClass(WordCount1.class);
// 设置需要执行jar的路径,下面根据Maven的Package命令生成的jar路径
job.setJar("E:\IDEA_workspace\mapreduce-test\target\mapreduce-test-1.0-SNAPSHOT.jar");

job.setMapperClass(WordCountMapper1.class);
job.setCombinerClass(WordCountReducer1.class);
job.setReducerClass(WordCountReducer1.class);

// job 输出key value 类型,mapper和reducer类型相同可用
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

// hdfs路径
FileInputFormat.addInputPath(job, new Path("/hdfsTest/input"));
FileOutputFormat.setOutputPath(job, new Path("/hdfsTest/output"));

System.exit(job.waitForCompletion(true) ? 0 : 1);
}catch (Exception e){
e.printStackTrace();
}
}

Maven的Clean和Package,再Rebuild Project,运行main函数,查看日志成功打印

Yarn上也显示运行成功


3. 坑
HDFS和Windows的路径在IDEA上会被识别错误。要Maven进行Clean和Package,然后再Rebuild Project就可以了。

IDEA再Windows上,所以Hadoop会获取Windows上的用户,和集群不同会报错。可以在Windows中添加环境变量,或者在hdfs-site.xml设置权限不可用。

<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property> 
需要设置跨平台。可直接在代码中设置,或者在mapred-site.xml设置跨平台。

<property>
<name>mapreduce.app-submission.cross-platform</name>
<value>true</value>
</property> 
参考:
本地idea开发mapreduce程序提交到远程hadoop集群执行
Exception message: /bin/bash: line 0: fg: no job control
————————————————
版权声明:本文为CSDN博主「shpunishment」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/qq_36160730/article/details/101292584

原文地址:https://www.cnblogs.com/javalinux/p/14927051.html