spark-master源码之schedule

schedule方法的总源码：

 /**
   * Schedule the currently available resources among waiting apps. This method will be called
   * every time a new app joins or resource availability changes.
   */
  private def schedule(): Unit = {
//首先判断，master是alive状态，如不是，结束
    if (state != RecoveryState.ALIVE) {
      return
    }
    // Drivers take strict precedence over executors
// Random shuffle的原理，对传入的集合元素进行随机的打乱
    val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE))
    val numWorkersAlive = shuffledAliveWorkers.size
    var curPos = 0
//首先调度driver，只有用yarn-cluster模式提交的时候，才会注册driver。standalone和yarn-client模式，都会在本地直接启动driver，
//而不会来注册driver，更不可能让master调度driver
//driver进行注册的时候，会把信息放到等待队列中waitingDrivers
    for (driver <- waitingDrivers.toList) { // iterate over a copy of waitingDrivers
      // We assign workers to each waiting driver in a round-robin fashion. For each driver, we
      // start from the last worker that was assigned a driver, and continue onwards until we have
      // explored all alive workers.
      var launched = false
      var isClusterIdle = true
      var numWorkersVisited = 0
//driver没有被启动时运行，并且启动worker的数量进行限制
      while (numWorkersVisited < numWorkersAlive && !launched) {
        val worker = shuffledAliveWorkers(curPos)
        isClusterIdle = worker.drivers.isEmpty && worker.executors.isEmpty
        numWorkersVisited += 1
//driver启动需要的条件，例如内存，cpu等资源
        if (canLaunchDriver(worker, driver.desc)) {
          val allocated = worker.acquireResources(driver.desc.resourceReqs)
          driver.withResources(allocated)
//启动driver
          launchDriver(worker, driver)
//从等待队列中去除当前启动的driver
          waitingDrivers -= driver
          launched = true
        }
        curPos = (curPos + 1) % numWorkersAlive
      }
      if (!launched && isClusterIdle) {
        logWarning(s"Driver ${driver.id} requires more resource than any of Workers could have.")
      }
    }
    startExecutorsOnWorkers()
  }

启动driver方法的源码：

  private def launchDriver(worker: WorkerInfo, driver: DriverInfo): Unit = {
    logInfo("Launching driver " + driver.id + " on worker " + worker.id)
//将driver加入worker内存的缓存结构
    worker.addDriver(driver)
//将worker内使用的内存和cpu数量，都加上driver需要的内存和cpu数量
    driver.worker = Some(worker)
//调用worker的actor，给它发送launchDriver消息，让worker来启动driver
    worker.endpoint.send(LaunchDriver(driver.id, driver.desc, driver.resources))
//将driver的状态设置成running
    driver.state = DriverState.RUNNING
  }

启动workers源码：

  /**
   * Schedule and launch executors on workers
   */
  private def startExecutorsOnWorkers(): Unit = {
    // Right now this is a very simple FIFO scheduler. We keep trying to fit in the first app
    // in the queue, then the second app, etc.
//遍历waitingApps中的ApplicationInfo,并且过滤还有需要调度的core的application
    for (app <- waitingApps) {
      val coresPerExecutor = app.desc.coresPerExecutor.getOrElse(1)
      // If the cores left is less than the coresPerExecutor,the cores left will not be allocated
      if (app.coresLeft >= coresPerExecutor) {
        // Filter out workers that don't have enough resources to launch an executor
//从workers中，过滤出状态为ALIVE的，再次过滤可以被application使用的worker，然后安装cpu的数量进行倒排
        val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)
          .filter(canLaunchExecutor(_, app.desc))
          .sortBy(_.coresFree).reverse
        if (waitingApps.length == 1 && usableWorkers.isEmpty) {
          logWarning(s"App ${app.id} requires more resource than any of Workers could have.")
        }
//给每个worker分配多少个cores（cpu数量），数组
        val assignedCores = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)

        // Now that we've decided how many cores to allocate on each worker, let's allocate them
//分配worker和cpu数量。要启动的executor，平均分配到各个worker上去。
        for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) {
          allocateWorkerResourceToExecutors(
            app, assignedCores(pos), app.desc.coresPerExecutor, usableWorkers(pos))
        }
      }
    }
  }

扩展：中华石杉-spark从入门到精通，第48讲

要理解以上源码，需要知道其中的关系：

spark一个集群会有多个master节点和多个worker节点，master节点负责管理worker节点，worker节点管理Excetor。

一个worker节点包含多个Excetor，每个Excetor多个cpu core和一定memory。

扩展阅读：worker，Excetor，CPU core之间的关系。