【Hadoop离线基础总结】oozie调度MapReduce任务


  • 1.准备MR执行的数据

    MR的程序可以是自己写的,也可以是hadoop工程自带的。这里选用hadoop工程自带的MR程序来运行wordcount的示例
    准备以下数据上传到HDFS的/oozie/input路径下去

    hdfs dfs -mkdir -p /oozie/input
    vim wordcount.txt
    
    hello   world   hadoop
    spark   hive    hadoop
    

    hdfs dfs -put wordcount.txt /oozie/input 将数据上传到hdfs对应目录

  • 2.执行官方测试案例

    yarn jar /export/servers/hadoop-2.6.0-cdh5.14.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.14.0.jar wordcount /oozie/input/ /oozie/output

  • 3.准备我们调度的资源

    将需要调度的资源都准备好放到一个文件夹下面去,包括jar包、ob.properties以及workflow.xml
    拷贝MR的任务模板

    cd /export/servers/oozie-4.1.0-cdh5.14.0
    cp -ra examples/apps/map-reduce/ oozie_works/
    

    删掉MR任务模板lib目录下自带的jar包

    cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib
    rm -rf oozie-examples-4.1.0-cdh5.14.0.jar
    

    拷贝jar包到对应目录
    从上一步的删除当中,可以看到需要调度的jar包存放在了 /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib 目录下,所以把需要调度的jar包也放到这个路径下即可
    cp /export/servers/hadoop-2.6.0-cdh5.14.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.14.0.jar /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib/

  • 4.修改配置文件

    修改job.properties

    cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce
    vim job.properties
    
    nameNode=hdfs://node01:8020
    jobTracker=node01:8032
    queueName=default
    examplesRoot=oozie_works
    
    oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/map-reduce/workflow.xml
    outputDir=/oozie/output
    inputdir=/oozie/input
    

    修改workflow.xml

    cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce
    vim workflow.xml
    
    <?xml version="1.0" encoding="UTF-8"?>
    <!--
      Licensed to the Apache Software Foundation (ASF) under one
      or more contributor license agreements.  See the NOTICE file
      distributed with this work for additional information
      regarding copyright ownership.  The ASF licenses this file
      to you under the Apache License, Version 2.0 (the
      "License"); you may not use this file except in compliance
      with the License.  You may obtain a copy of the License at
      
           http://www.apache.org/licenses/LICENSE-2.0
      
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License.
    -->
    <workflow-app xmlns="uri:oozie:workflow:0.5" name="map-reduce-wf">
        <start to="mr-node"/>
        <action name="mr-node">
            <map-reduce>
                <job-tracker>${jobTracker}</job-tracker>
                <name-node>${nameNode}</name-node>
                <prepare>
                    <delete path="${nameNode}/${outputDir}"/>
                </prepare>
                <configuration>
                    <property>
                        <name>mapred.job.queue.name</name>
                        <value>${queueName}</value>
                    </property>
                    <!--把这些原有的配置注释掉-->
    				<!--  
                    <property>
                        <name>mapred.mapper.class</name>
                        <value>org.apache.oozie.example.SampleMapper</value>
                    </property>
                    <property>
                        <name>mapred.reducer.class</name>
                        <value>org.apache.oozie.example.SampleReducer</value>
                    </property>
                    <property>
                        <name>mapred.map.tasks</name>
                        <value>1</value>
                    </property>
                    <property>
                        <name>mapred.input.dir</name>
                        <value>/user/${wf:user()}/${examplesRoot}/input-data/text</value>
                    </property>
                    <property>
                        <name>mapred.output.dir</name>
                        <value>/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}</value>
                    </property>
    				-->
    				
    				   <!-- 开启使用新的API来进行配置 -->
                    <property>
                        <name>mapred.mapper.new-api</name>
                        <value>true</value>
                    </property>
    
                    <property>
                        <name>mapred.reducer.new-api</name>
                        <value>true</value>
                    </property>
    
                    <!-- 指定MR的输出key的类型 -->
                    <property>
                        <name>mapreduce.job.output.key.class</name>
                        <value>org.apache.hadoop.io.Text</value>
                    </property>
    
                    <!-- 指定MR的输出的value的类型-->
                    <property>
                        <name>mapreduce.job.output.value.class</name>
                        <value>org.apache.hadoop.io.IntWritable</value>
                    </property>
    
                    <!-- 指定输入路径 -->
                    <property>
                        <name>mapred.input.dir</name>
                        <value>${nameNode}/${inputdir}</value>
                    </property>
    
                    <!-- 指定输出路径 -->
                    <property>
                        <name>mapred.output.dir</name>
                        <value>${nameNode}/${outputDir}</value>
                    </property>
    
                    <!-- 指定执行的map类 -->
                    <property>
                        <name>mapreduce.job.map.class</name>
                        <value>org.apache.hadoop.examples.WordCount$TokenizerMapper</value>
                    </property>
    
                    <!-- 指定执行的reduce类 -->
                    <property>
                        <name>mapreduce.job.reduce.class</name>
                        <value>org.apache.hadoop.examples.WordCount$IntSumReducer</value>
                    </property>
    				<!--  配置map task的个数 -->
                    <property>
                        <name>mapred.map.tasks</name>
                        <value>1</value>
                    </property>
    
                </configuration>
            </map-reduce>
            <ok to="end"/>
            <error to="fail"/>
        </action>
        <kill name="fail">
            <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
        </kill>
        <end name="end"/>
    </workflow-app>
    
  • 5.上传调度任务到hdfs对应目录
    cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works
    hdfs dfs -put map-reduce/ /user/root/oozie_works/
    
  • 6.执行调度任务

    执行调度任务,然后通过oozie的11000端口进行查看任务结果

    cd /export/servers/oozie-4.1.0-cdh5.14.0
    bin/oozie job -oozie http://node03:11000/oozie -config oozie_works/map-reduce/job.properties -run
    
原文地址:https://www.cnblogs.com/zzzsw0412/p/12772457.html