虚拟机群安装多个hadoop集群时遇到的问题

背景,原来在我的虚拟机集群(nn1,nn2)中安装的是cdh23502,后来做升级实验,升到cdh26550,因为生产中使用的环境是cdh23502,所以再次切换回去。

切换的过程中,遇到一些问题,特记录于此。仍然共用原来的zookeeper

1 启动nodemanager的时候,报异常如下:

FATAL org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Failed to initialize mapreduce_shuffle
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.mapred.ShuffleHandler not found

原因分析:
    类找不到的异常,没有找到包含org.apache.hadoop.mapred.ShuffleHandler的jar 包。
我在重做nn1的时候,先把/app下面所有的应用复制出来,在复制的过程中丢失了一些软链,然后在启动nodemanager的时候造在找不到相关的类。
解决方案:
cd /app/cdh23502/share/hadoop
ln -s mapreduce2/ mapreduce
然后再运行yarn-daemon.sh start nodemanager 即可

2.启动两个resourcemanager,有如下异常:

修改为cdh23502之后,启动resourcemanager 发现两个都是standby状态。
org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
        at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode
        at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
        at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
        ... 4 more
Caused by: org.apache.hadoop.service.ServiceStateException: RMActiveServices cannot enter state STARTED from state STOPPED
        at org.apache.hadoop.service.ServiceStateModel.checkStateTransition(ServiceStateModel.java:129)
        at org.apache.hadoop.service.ServiceStateModel.enterState(ServiceStateModel.java:111)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:190)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
        at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
        ... 5 more

解决方案:
    怀疑和zookeeper有关,处理zookeeper中yarn相关的节点,重新启动yarn.

3.如果你的两个nameode也都处于standby状态,无法成功转成activer 状态,那就可能跟dfszkfc进程有关,可以查阅相关的日志,

真不行,可以重新注册到zookeeper.

hdfs zkfc –formatZK

这个其实在之间的实验中也遇到过,之前就在博客中记录过,这里再强调一遍,与第一个情况类似。

原文地址:https://www.cnblogs.com/huaxiaoyao/p/5128401.html