Spark2.x(五十九):yarn-cluster模式提交Spark任务,如何关闭client进程?

问题:

最近现场反馈采用yarn-cluster方式提交spark application后,在提交节点机上依然会存在一个yarn的client进程不关闭,又由于spark application都是spark structured streaming程序(application常年累月的执行),最终导致spark application提交节点服务器资源被占满,当执行其他操作时,会出现以下错误:

[dx@my-linux-01 bin]$ yarn logs -applicationId application_15644802175503_0189
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000c000000, 702021632, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 702021632 bytes to committing reserved memory.
# An error report file with more information is saved as:
# /home/dx/myProj/appApp/bin/hs_err_pid53561.log
[dx@my-linux-01 bin]$ 

现场对spark application提交节点进行分析发现占用进程主要是(yarn client集成占用):

[dx@my-linux-01 bin]$ top
PID     USER  PR  NI    VIRT     RES  SHR   S  %CPU   %MEM   TIME+    COMMAND
122236  dx    20  0  20.629g  1.347g  3520  S   0.3    2.1   7:02.42     java
122246  dx    20  0  20.629g  1.311g  3520  S   0.3    2.0   7:03.42     java
122236  dx    20  0  20.629g  1.288g  3520  S   0.3    2.2   7:05.83     java
122346  dx    20  0  20.629g  1.344g  3520  S   0.3    2.1   7:10.42     java
121246  dx    20  0  20.629g  1.343g  3520  S   0.3    2.3   7:01.42     java
122346  dx    20  0  20.629g  1.341g  3520  S   0.3    2.4   7:03.39     java
112246  dx    20  0  20.629g  1.344g  3520  S   0.3    2.0   7:02.42     java
............
112260  dx    20  0  20.629g  1.344g  3520  S   0.3    2.0   7:02.02     java
112260  dx    20  0  113116      200     0  S   0.0    0.0   0:00.00     sh
............

Yarn提交Spark任务分析:

yarn方式提交spark application包含两种:

1)yarn-client(spark-submit --master yarn --deploy-mode client ...):

这种方式spark提交application任务之后,driver运行在提交服务器节点,且driver运行yarn的client进程中,因此如果关闭了提交服务器节点上client进程会导致driver被关闭,进而导致application被关闭。

2)yarn-cluster(spark-submit --master yarn --deploy-mode cluster):

这种方式spark提交application任务之后,driver运行yarn分配container内,container内分配一个AM(Application Master)进程,SparkContext(driver)运行在该AM内,在yarn提交时,在提交节点上也会启动一个yarn的client进程,默认yarn-client方式提交完application后会等待任务结束(failed,finished等),否则会一直运行。

解决方案:

yarn.client的参数

spark.yarn.submit.waitAppCompletion

如果设置这个参数为true 的话,client将会一直运行并且报告application的状态直到application退出(无论何种原因);

如果设置这个参数为false的话,client的进程将会在application提交后退出。

在spark-submit 参数添加参数

./bin/spark-submit.sh 
--master yarn 
--deploy-mode cluster 
--conf spark.yarn.submit.waitAppCompletion=false
....

对应yarn.client类中代码位置:

  /**
   * Submit an application to the ResourceManager.
   * If set spark.yarn.submit.waitAppCompletion to true, it will stay alive
   * reporting the application's status until the application has exited for any reason.
   * Otherwise, the client process will exit after submission.
   * If the application finishes with a failed, killed, or undefined status,
   * throw an appropriate SparkException.
   */
  def run(): Unit = {
    this.appId = submitApplication()
    if (!launcherBackend.isConnected() && fireAndForget) {
      val report = getApplicationReport(appId)
      val state = report.getYarnApplicationState
      logInfo(s"Application report for $appId (state: $state)")
      logInfo(formatReportDetails(report))
      if (state == YarnApplicationState.FAILED || state == YarnApplicationState.KILLED) {
        throw new SparkException(s"Application $appId finished with status: $state")
      }
    } else {
      val (yarnApplicationState, finalApplicationStatus) = monitorApplication(appId)
      if (yarnApplicationState == YarnApplicationState.FAILED ||
        finalApplicationStatus == FinalApplicationStatus.FAILED) {
        throw new SparkException(s"Application $appId finished with failed status")
      }
      if (yarnApplicationState == YarnApplicationState.KILLED ||
        finalApplicationStatus == FinalApplicationStatus.KILLED) {
        throw new SparkException(s"Application $appId is killed")
      }
      if (finalApplicationStatus == FinalApplicationStatus.UNDEFINED) {
        throw new SparkException(s"The final status of application $appId is undefined")
      }
    }
  }
原文地址:https://www.cnblogs.com/yy3b2007com/p/11302886.html