spark on yarn提交后vcore数不对

一、现象描述

比如提交命令:

./bin/spark-submit --class org.apache.spark.examples.SparkPi 
    --master yarn 
    --deploy-mode cluster 
    --driver-memory 4g 
    --executor-memory 2g 
--num-executors 6 --executor-cores 3 --queue thequeue lib/spark-examples*.jar 10

理论上:vcores使用数 = executor-cores * num-executors + 1 = 6 * 3 = 18 + 1 = 19,
但是实际中很可能你会在yarn监控界面上看到vcores数使用只是7,也就是executor-cores没起作用。
二、解决方法
这其实不是spark的问题,而是yarn调度器的一个特性,只需要修改“capacity-scheduler.xml”文件中的配置“yarn.scheduler.capacity.resource-calculator”即可,
value由原来的“org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator”修改为“org.apache.hadoop.yarn.util.resource.DominantResourceCalculator”
可能需要重启hadoop
三、参考
出处:http://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
配置“yarn.scheduler.capacity.resource-calculator”的解释为:
The ResourceCalculator implementation to be used to compare Resources in the scheduler. The default i.e. org.apache.hadoop.yarn.util.resource.DefaultResourseCalculator only uses Memory while DominantResourceCalculator uses Dominant-resource to compare multi-dimensional resources such as Memory, CPU etc. A Java ResourceCalculator class name is expected.
也没翻译很明白,大概意思就是:默认的那个配置,只对内存起作用,而后改的那个配对内存、CPU核数等等都起作用。
原文地址:https://www.cnblogs.com/yesecangqiong/p/10125333.html