oozie4.0.0安装

安装完hadoop后便可安装oozie运行自己的工作流:

1、下载oozie压缩包,oozie-4.0.0-cdh5.0.0.tar.gz,下载地址http://archive.cloudera.com/cdh5/cdh/5/

2、下载ext-2.2.zip:http://extjs.com/deploy/ext-2.2.zip

3、下载tomcat并解压

4、下载maven,(下载的oozie是已经编译好的,如果是未编译的需要用maven先编译一下才能安装)

5、解压oozie到安装目录,解压并设置环境变量如下:

export MAVEN_HOME=/export/servers/apache-maven-3.0.5

export TOMCAT_HOME=/export/servers/apache-tomcat-6.0.26

export OOZIE_HOME=/export/servers/oozie-4.0.0-cdh5.0.0
export OOZIE_CONFIG=/export/servers/oozie-4.0.0-cdh5.0.0/conf
export PATH=$JAVA_HOME/bin:JRE_HOME/bin:$HADOOP_HOME/bin:$MAVEN_HOME/bin:$OOZIE_HOME/bin:$TOMCAT_HOME/bin:$PATH

使环境变量生效:source /etc/profile

6、修改oozie配置文件,conf目录结构如下:

action-conf目录下只有一个hive.xml文件,修改内容如下:

<configuration>
<!-- An example of setting default properties for Hive action.
This could be useful with Hadoop versions that have deprecated
HADOOP_HOME that Hive still relies on.

<property>
<name>hadoop.bin.path</name>
<value>/export/servers/hadoop-2.2.0/bin/hadoop</value>
</property>

<property>
<name>hadoop.config.dir</name>
<value>/export/servers/hadoop-2.2.0/etc/hadoop</value>
</property>
-->
</configuration>

hadoop-conf/core-site.xml:

<configuration>

<property>
<name>mapreduce.jobtracker.kerberos.principal</name>
<value>mapred/_HOST@LOCALREALM</value>
</property>

<property>
<name>yarn.resourcemanager.principal</name>
<value>yarn/_HOST@LOCALREALM</value>
</property>

<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/_HOST@LOCALREALM</value>
</property>

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

</configuration>

hadoop-config.xml文件内容与hadoop-conf/core-site.xml文件内容一样,无需修改。

oozie-default.xml该文件改动主要有两处:

1、<property>
<name>oozie.services</name>
<value>
org.apache.oozie.service.SchedulerService,
org.apache.oozie.service.InstrumentationService,
org.apache.oozie.service.CallableQueueService,
org.apache.oozie.service.UUIDService,
org.apache.oozie.service.ELService,
org.apache.oozie.service.AuthorizationService,
org.apache.oozie.service.UserGroupInformationService,
org.apache.oozie.service.HadoopAccessorService,
org.apache.oozie.service.URIHandlerService,
org.apache.oozie.service.MemoryLocksService,
org.apache.oozie.service.DagXLogInfoService,
org.apache.oozie.service.SchemaService,
org.apache.oozie.service.LiteWorkflowAppService,
org.apache.oozie.service.JPAService,
org.apache.oozie.service.StoreService,
org.apache.oozie.service.CoordinatorStoreService,
org.apache.oozie.service.SLAStoreService,
org.apache.oozie.service.DBLiteWorkflowStoreService,
org.apache.oozie.service.CallbackService,
org.apache.oozie.service.ActionService,
org.apache.oozie.service.ShareLibService,
org.apache.oozie.service.ActionCheckerService,
org.apache.oozie.service.RecoveryService,
org.apache.oozie.service.PurgeService,
org.apache.oozie.service.CoordinatorEngineService,
org.apache.oozie.service.BundleEngineService,
org.apache.oozie.service.DagEngineService,
org.apache.oozie.service.CoordMaterializeTriggerService,
org.apache.oozie.service.StatusTransitService,
org.apache.oozie.service.PauseTransitService,
org.apache.oozie.service.GroupsService,
org.apache.oozie.service.ProxyUserService,
org.apache.oozie.service.XLogStreamingService,
org.apache.oozie.service.JobsConcurrencyService
</value>
<description>
All services to be created and managed by Oozie Services singleton.
Class names must be separated by commas.
</description>
</property>

将该节点的org.apache.oozie.service.JobsConcurrencyService类提至第一行,如下:

<property>
<name>oozie.services</name>
<value>
org.apache.oozie.service.JobsConcurrencyService,
org.apache.oozie.service.SchedulerService,

。。。。

2、去掉下面节点,(其实去不去无所谓,根据自己的实际应用来)

<property>
<name>oozie.service.coord.check.maximum.frequency</name>
<value>true</value>
<description>
When true, Oozie will reject any coordinators with a frequency faster than 5 minutes. It is not recommended to disable
this check or submit coordinators with frequencies faster than 5 minutes: doing so can cause unintended behavior and
additional system stress.
</description>
</property>

oozie-site.xml,修改的地方主要有以下几点:

1、<property>
<name>oozie.service.ActionService.executor.ext.classes</name>
<value>
org.apache.oozie.action.email.EmailActionExecutor,
org.apache.oozie.action.hadoop.HiveActionExecutor,
org.apache.oozie.action.hadoop.ShellActionExecutor,
org.apache.oozie.action.hadoop.SqoopActionExecutor,
org.apache.oozie.action.hadoop.DistcpActionExecutor
</value>
</property>该节点修改成如下,添加几项内容:

<property>
<name>oozie.subworkflow.classpath.inheritance</name>
<value>true</value>
</property>
<property>
<name>oozie.servlet.CallbackServlet.max.data.len</name>
<value>1048576</value>
</property>

<property>
<name>oozie.service.ActionService.executor.ext.classes</name>
<value>
org.apache.oozie.action.email.EmailActionExecutor,
org.apache.oozie.action.hadoop.HiveActionExecutor,
org.apache.oozie.action.hadoop.ShellActionExecutor,
org.apache.oozie.action.hadoop.SqoopActionExecutor,
org.apache.oozie.action.hadoop.DistcpActionExecutor
</value>
</property>

2、

<property>
<name>oozie.service.JPAService.jdbc.driver</name>
<value>com.mysql.jdbc.Driver</value>
<description>
JDBC driver class.
</description>
</property>

<property>
<name>oozie.service.JPAService.jdbc.url</name>
<value>jdbc:mysql://192.168.157.92:3358/oozie4</value>
<description>
JDBC URL.
</description>
</property>

<property>
<name>oozie.service.JPAService.jdbc.username</name>
<value>root</value>
<description>
DB user name.
</description>
</property>

<property>
<name>oozie.service.JPAService.jdbc.password</name>
<value>123456</value>
<description>
DB user password.

IMPORTANT: if password is emtpy leave a 1 space string, the service trims the value,
if empty Configuration assumes it is NULL.
</description>
</property>

这几个节点的作用是:oozie有一个默认的derby数据库,是用来存储oozie节点的相关信息的,如果想用自己的mysql数据库,可按照上面例子配置

3、

<property>
<name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
<value>*=/export/servers/hadoop-2.2.0/etc/hadoop</value>
<description>
Comma separated AUTHORITY=HADOOP_CONF_DIR, where AUTHORITY is the HOST:PORT of
the Hadoop service (JobTracker, HDFS). The wildcard '*' configuration is
used when there is no exact match for an authority. The HADOOP_CONF_DIR contains
the relevant Hadoop *-site.xml files. If the path is relative is looked within
the Oozie configuration directory; though the path can be absolute (i.e. to point
to Hadoop client conf/ directories in the local filesystem.
</description>
</property>

该节点是设置hadoop的配置文件目录

4、

<!-- Proxyuser Configuration -->

<property>
<name>oozie.service.ProxyUserService.proxyuser.#USER#.hosts</name>
<value>*</value>
<description>
List of hosts the '#USER#' user is allowed to perform 'doAs'
operations.

The '#USER#' must be replaced with the username o the user who is
allowed to perform 'doAs' operations.

The value can be the '*' wildcard or a list of hostnames.

For multiple users copy this property and replace the user name
in the property name.
</description>
</property>

<property>
<name>oozie.service.ProxyUserService.proxyuser.#USER#.groups</name>
<value>*</value>
<description>
List of groups the '#USER#' user is allowed to impersonate users
from to perform 'doAs' operations.

The '#USER#' must be replaced with the username o the user who is
allowed to perform 'doAs' operations.

The value can be the '*' wildcard or a list of groups.

For multiple users copy this property and replace the user name
in the property name.
</description>
</property>

将该两个节点的注释去掉。

因为用到了mysql数据库,所以需要将mysql的jar包mysql-connector-java-5.1.20.jar拷贝到oozie的lib目录及libtools目录下

至此,oozie的配置文件修改完毕,下面做一些oozie启动前的准备工作 :

1、进入到mysql数据库,创建在oozie-site.xml文件中指定的数据库oozie:

    create database oozie;    (创建名称为oozie的数据库)
    grant all privileges on oozie.* to 'root'@'localhost' identified by '123456';    (设置oozie数据库的访问全选,创建用户名为oozie,密码为oozie的用户)
    grant all privileges on oozie.* to 'root'@'%' identified by '123456';    (设置oozie数据库的访问权限)
    FLUSH PRIVILEGES;

2、在$OOZIE_HOME/bin目录下执行以下命令,生成创建数据库表的脚本:

 sh ooziedb.sh create -sqlfile oozie.sql

3、执行数据库脚本,生成相关数据库表:

 sh oozie-setup.sh db create -run  -sqlfile oozie.sql

至此,数据库配置完毕。

4、生成oozie.war包:

执行如下命令生成oozie.war包,还是在bin目录下执行:

sh addtowar.sh -inputwar $OOZIE_HOME/oozie.war -outputwar $OOZIE_HOME/oozie-server/webapps/oozie.war -hadoop 2.2.0 $HADOOP_HOME -extjs ext-2.2.zip

5、生成的war包可能没有带mysql-connector-java-5.1.20.jar包,所以需要将该jar包也加到war包中去,否则后面启动oozie时会报错。

6、在$OOZIE_HOME/bin目录下执行(不知道该步骤有啥用)

sh oozie-setup.sh sharelib create -fs hdfs://hadoop-master:8020 -locallib $OOZIE_HOME/oozie-sharelib-4.0.0-cdh5.2.0-yarn.tar.gz

(hadoop2多hdfs集群,hdfs://cluster1是core-site.xml中defaultFs名称:)sh oozie-setup.sh sharelib create -fs hdfs://cluster1 -locallib $OOZIE_HOME/oozie-sharelib-4.0.0-cdh5.0.0-beta-2-yarn.tar.gz

6、启动oozie:

前台运行oozie:

sh oozied.sh run

后台运行oozie:

sh oozied.sh start

启动后看看http://hadoop-master:11000/oozie如下:

7、运行ooize配置:

运行oozie工作流的目录基本机构如下:

8、将上面目录文件上传至hdfs目录下,如/user/root/oozie/workflow/oozieTest目录下

执行oozie命令如下,可以将命令直接放到一个sh文件中,下次直接sh执行即可:

run_oozie.sh:

oozie job -oozie http://hadoop-master:11000/oozie -config $1 -D nameNode=hdfs://hadoop-master:8020 -D jobTracker=hadoop-master:8032 -D queueName=root -D frequency=60 -D nolockTime=0 -D start=2013-11-22T10:00Z -D end=2014-08-30T00:00Z -run

(注:hadoop2如果是单个hdfs集群,则跟上面差不多,但是如果是多个hdfs集群,就不一样了,其中hdfs://cluster1是core-site.xml中的defaultFs名称,而且没有端口号,jobtracker端口是8032,需要按照如下格式写:)

oozie job -oozie http://hadoop-kf105.jd.com:11000/oozie -config $1 -D nameNode=hdfs://cluster1 -D jobTracker=hadoop-kf100.jd.com:8032 -D frequency=60 -D nolockTime=0 -D start=2013-11-22T10:00Z -D end=2014-08-30T00:00Z -run

执行一个工作流的话可以这样:sh run_oozie.sh oozieTest/job.properties

kill_oozie.sh:

oozie job -oozie http://hadoop-master:11000/oozie -kill $1 

kill一个工作流使用:sh kill_oozie.sh jobId

好了,oozie先介绍到这包,以后有什么新内容再继续补充。

原文地址:https://www.cnblogs.com/zhli/p/4823354.html