org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist

先来看一下报错内容

20/07/17 10:20:07 INFO yarn.YarnAllocator: Will request 1 executor container(s), each with 1 core(s) and 1408 MB memory (including 384 MB of overhead)
20/07/17 10:20:07 INFO yarn.YarnAllocator: Submitted 1 unlocalized container requests.
20/07/17 10:20:07 WARN yarn.YarnAllocator: Cannot find executorId for container: container_1594881950724_0016_01_000264
20/07/17 10:20:07 INFO yarn.YarnAllocator: Completed container container_1594881950724_0016_01_000264 (state: COMPLETE, exit status: -100)
20/07/17 10:20:07 INFO yarn.YarnAllocator: Container marked as failed: container_1594881950724_0016_01_000264. Exit status: -100. Diagnostics: Container released by application.
20/07/17 10:20:08 INFO yarn.YarnAllocator: Launching container container_1594881950724_0016_01_000265 on host mip-test-hdp134 for executor with ID 264
20/07/17 10:20:08 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them.
20/07/17 10:20:08 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
20/07/17 10:20:08 INFO impl.ContainerManagementProtocolProxy: Opening proxy : mip-test-hdp134:23855
20/07/17 10:20:08 ERROR yarn.YarnAllocator: Failed to launch executor 264 on container container_1594881950724_0016_01_000265
org.apache.spark.SparkException: Exception while starting container container_1594881950724_0016_01_000265 on host mip-test-hdp134
    at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:125)
    at org.apache.spark.deploy.yarn.ExecutorRunnable.run(ExecutorRunnable.scala:65)
    at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$runAllocatedContainers$1$$anon$2.run(YarnAllocator.scala:546)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist
    at sun.reflect.GeneratedConstructorAccessor22.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
    at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
    at org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:205)
    at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:122)
    ... 5 more
20/07/17 10:20:10 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL TERM
20/07/17 10:20:10 INFO yarn.ApplicationMaster: Final app status: UNDEFINED, exitCode: 16, (reason: Shutdown hook called before final status was reported.)
20/07/17 10:20:10 INFO util.ShutdownHookManager: Shutdown hook called

重点是

Caused by: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist

一番搜索之后得到的解决方案是

在yarn-site.xml中添加如下配置

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>spark_shuffle,mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
    <value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>

之后重启yarn。

然而,只做这两个操作是不够的,需要检查一下${HADOOP_HOME}/share/hadoop/yarn/lib目录下是否有spark-*-yarn-shuffle.jar,其中*代表spark版本号,如果没有需要从spark的安装目录下拷贝过来。

spark-*-yarn-shuffle.jar在spark的yarn目录下(也有人说是在jar目录下,可能不同的spark版本有差别吧,未深究)


参考:

1. spark提交至yarn的的动态资源分配

  https://www.cnblogs.com/hejunhong/p/12335258.html

2. Spark任务异常The auxService spaark_shuffle does not exist

https://blog.csdn.net/weixin_39588015/article/details/79365277?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase

原文地址:https://www.cnblogs.com/144823836yj/p/13328770.html