ApplicationMaster 启动流程与服务简介

摘要: yarn 中把jobtracker的功能拆分为ResourceManager和ApplicationMaster, 本文简要介绍了ApplicationMaster 启动流程与提供的服务.

ApplicationMaster 启动流程

ApplicationMaster 的启动在org.apache.hadoop.mapreduce.v2.app.MRAppMaster的main方法中, 流程如下:

1. 首先解析启动参数, 如containerID, NM_HOST等参数

2. 根据解析到的参数实例化对象, 启动ShutdownHookManager

3. 初始化AppMaster, 从conf中获取UGI (NM本地的conf) 然后初始化并启动appMaster线程.

4. 初始化appMaster:

a) 进行安全认证

b) 当前app的重试次数为1次, 通过yarn.resourcemanager.am.max-retries设置. 这里只是hack, (是否有实现次功能?)

c) 获取配置, 初始化作业名字, jobid, 是否new-api, committer, numReduceTasks, recoveryEnabled, recoverySupportedByCommitter, 服务列表: recoveryServ, dispatcher, taskAttemptListener, taskCleaner, clientService, historyService, jobEventDispatcher, TaskEventDispatcher, TaskAttemptEventDispatcher, speculatorEventDispatcher, containerAllocator, containerLauncher, createStagingDirCleaningService

5. 启动appMaster

服务列表

1. taskAttemptListener, 继承于TaskUmbilicalProtocol协议, 启动RPC服务, 默认启动30个线程(handler), 接收container的心跳. 在statusUpdate方法中处理task的状态信息, 创建TaskAttemptStatusUpdateEvent时间, 放到队列中.

2. taskCleaner, 启动一个5个线程的线程池, 处理cleanup的task, 主要执行了Committer().abortTask的方法.

3. clientService, 使用MRClientProtocol协议, 服务jobclient的请求, 启动RPC服务, 默认启动一个线程, 另外还会启动webApp (MWebApp)服务,

4. historyService, jobhistory事件使用这个类, 这个类写jobhistory事件到DFS的staging目录. 最后移动到done目录. 首先从eventQueue中获取一个JobHistoryEvent事件, 然后根据事件的类型, 比如AM_STARTED或JOB_SUBMITTED等事件, 做相应的处理, 即写相应的jobhistory到输出流..

5. speculator, 根据配置初始化yarn.mapreduce.job.speculator.class类, 默认为DefaultSpeculator类, 它会启动一个speculationBackgroundCore线程, 它会定期计算出需要推测执行的map和reducetask, 真正执行推测算法的方法在maybeScheduleASpeculation和speculationValue()方法. 推测执行的算法已经改变:

a) 在maybeScheduleASpeculation方法中, 默认bestSpeculationValue=-1, 对于每个task, 执行speculationValue()方法,

b) speculationValue方法返回ON_SCHEDULE if thresholdRuntime(taskID) says that we should not considering speculating this task

c) speculationValue方法返回 ALREADY_SPECULATING if that is true. This has priority

d) speculationValue方法返回 TOO_NEW if our companion task hasn't gotten any information

e) speculationValue方法返回 PROGRESS_IS_GOOD if the task is sailing through

f) speculationValue方法返回NOT_RUNNING if the task is not running

g) 以上这些值都是负数,只有0或正数才有可能推测执行.

h) speculationValue方法 返回 预测结束时间 减去 预测(推测任务)的结束时间, 只有这个值大于等于0才有可能被推测,

i) maybeScheduleASpeculation每次对所有的task执行speculationValue方法, 选出bestSpeculationValue(即最值得推测的task), 然后把这个task加到mayHaveSpeculated队列中.

6. containerAllocator, 启动一个ContainerAllocator(LocalContainerAllocator 或 RMContainerAllocator)线程, 初始化这个线程的时候会读取 reduceSlowStart, maxReduceRampupLimit, maxReducePreemptionLimit, retryInterval等参数, 注意每个作业的reduceSlowStart配置mapreduce.job.reduce.slowstart.completedmaps, 以前是在调度器里实现的.

7. containerLauncher, 启动ContainerLauncher(LocalContainerLauncher或 ContainerLauncherImpl) 线程,

8. 处理器类包括jobEventDispatcher, JobHistoryEventHandler, TaskEventDispatcher, TaskAttemptEventDispatcher, speculatorEventDispatcher