Yarn SLS环境运行异常问题

运行Hadoop SLS原生环境时遇到的一些问题,记录下踩过的坑,方便后续朋友参考。

异常一:ERROR: output job file is existing

异常信息:
# ./rumen2sls.sh --rumen-file=/opt/cloudera/parcels/CDH/share/hadoop/tools/sls/sample-data/2jobs2min-rumen-jh.json  --output-dir=/root/cdh/sls/
./rumen2sls.sh: line 78: /bin/../libexec/hadoop-config.sh: No such file or directory

ERROR: output job file is existing

SLS simulation files available at: /root/cdh/sls/
原因&解决方式:找不到Hadoop依赖lib。运行 rumen2sls.sh 脚本前指定依赖的 lib 路径。
export HADOOP_LIBEXEC_DIR=/opt/cloudera/parcels/CDH/lib/hadoop/libexec/

异常二:找不到sls-runner.xml文件空指针异常

异常信息:
Exception in thread "main" java.lang.RuntimeException: java.lang.NullPointerException
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
        at org.apache.hadoop.yarn.sls.SLSRunner.startAMFromSLSTraces(SLSRunner.java:313)
        at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:248)
        at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:145)
        at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:528)
Caused by: java.lang.NullPointerException
        at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:123)
        ... 4 more

原因&解决方式:找不到sls-runner.xml,只有在/hadoop/etc/hadoop文件夹下的xml配置文件才会被发现,而在当前hadoop版本中,sls-runner.xml在$HADOOP_HOME/share/hadoop/tools/sls/sample-conf中。因此将sls-runner.xml拷贝至/etc/hadoop/conf下即可。

异常三:找不到UnManagedMRAMSimulatorForRealRM类

异常信息:
Exception in thread "main" java.lang.ClassNotFoundException: org.apache.hadoop.yarn.sls.appmaster.UnManagedMRAMSimulatorForRealRM
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:264)
        at org.apache.hadoop.yarn.sls.SLSRunner.<init>(SLSRunner.java:134)
        at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:526)

原因&解决方式:运行原生的SLS环境时,不依赖UnManagedMRAMSimulatorForRealRM这个类,更改sls-runner.xml文件的类为MRAMSimulator实现类即可。

  <property>
    <name>yarn.sls.am.type.mapreduce</name>
    <value>org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator</value>
    <!-- <value>org.apache.hadoop.yarn.sls.appmaster.UnManagedMRAMSimulatorForRealRM</value> -->
  </property>

异常四:html文件夹加载异常

异常信息:
20/04/23 10:04:16 INFO resourcemanager.ResourceManager: Using Scheduler: org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper
java.lang.NullPointerException
        at org.apache.hadoop.yarn.sls.web.SLSWebApp.<init>(SLSWebApp.java:86)
        at org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.initMetrics(ResourceSchedulerWrapper.java:478)
        at org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.setConf(ResourceSchedulerWrapper.java:177)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createScheduler(ResourceManager.java:306)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:506)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1081)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:270)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.sls.SLSRunner.startRM(SLSRunner.java:167)
        at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:141)
        at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:528)
原因&解决方式:web报错信息,说明html文件夹加载失败,打印slsrun.sh脚本的SLS_HTML参数,发现是错误的html路径,通过在slsrun.sh脚本的 calculateClasspath() 中添加参数'SLS_HTML=/opt/cloudera/parcels/CDH/share/hadoop/tools/sls/html/'手动指定html目录,问题解决。
calculateClasspath() {
  ……
  SLS_HTML="${HADOOP_PREFIX}/share/hadoop/tools/sls/html"
  SLS_HTML=/opt/cloudera/parcels/CDH/share/hadoop/tools/sls/html/    # 更新html路径
  ……
}
 
【参考资料】
原文地址:https://www.cnblogs.com/lemonu/p/13331255.html