mesos客户端重新注册导致容器状态为staged

一、问题描述

       1.在marathon上面,有五个容器状态为staged,并且这五个容器都是在同一台宿主机,mesos版本1.3.1

       2.在mesos-master上面可以看到这台宿主机重新注册了,那么去看看这个时间点为什么mesos-salve去重新注册了

                

     3.默认日志在/var/log/messages里面

        由于slave节点网络原因跟master节点失联了,导致slave节点网络恢复后又去重新向master注册,但mesos 1.3中mesos-salve连不上mesos-master,它就会认为mesos框架的marathon异常了,marathon重试的任务直接ignoring 掉了,这个逻辑也是有点醉。(这里我其实也没想明白,有大神给指点?)重启mesos-salve节点才行。

Apr 18 22:21:55 JQ-PZ-SER mesos-slave[15371]: I0418 22:21:55.839169 15410 slave.cpp:4826] No pings from master received within 75secs
                        Apr 18 22:21:55 JQ-PZ-SER mesos-slave[15371]: I0418 22:21:55.840070 15412 slave.cpp:913] Re-detecting master
                        Apr 18 22:21:55 JQ-PZ-SER mesos-slave[15371]: I0418 22:21:55.840323 15412 slave.cpp:959] Detecting new master
                        Apr 18 22:21:55 JQ-PZ-SER mesos-slave[15371]: I0418 22:21:55.840324 15415 status_update_manager.cpp:177] Pausing sending status updates
                        Apr 18 22:21:55 JQ-PZ-SER mesos-slave[15371]: I0418 22:21:55.840929 15414 status_update_manager.cpp:177] Pausing sending status updates
                        Apr 18 22:21:55 JQ-PZ-SER mesos-slave[15371]: I0418 22:21:55.840983 15402 slave.cpp:924] New master detected at master@10.X.X.X:5050
                        Apr 18 22:21:55 JQ-PZ-SER mesos-slave[15371]: I0418 22:21:55.841060 15402 slave.cpp:948] No credentials provided. Attempting to register without authentication
                        Apr 18 22:21:55 JQ-PZ-SER mesos-slave[15371]: I0418 22:21:55.841145 15402 slave.cpp:959] Detecting new master
                        Apr 18 22:21:56 JQ-PZ-SER mesos-slave[15371]: I0418 22:21:56.460063 15413 slave.cpp:1235] Re-registered with master master@10.X.X.X:5050
                        Apr 18 22:21:56 JQ-PZ-SER mesos-slave[15371]: I0418 22:21:56.460983 15411 slave.cpp:3088] Shutting down framework 967c09ae-90ce-4dd1-ba80-2ad2da9fd545-0000
                        Apr 18 22:21:56 JQ-PZ-SER mesos-slave[15371]: W0418 22:21:56.462837 15411 slave.cpp:3230] Ignoring info update for framework 967c09ae-90ce-4dd1-ba80-2ad2da9fd545-0000 because it is terminating
                        Apr 18 22:21:56 JQ-PZ-SER mesos-slave[15371]: I0418 22:21:56.542073 15407 slave.cpp:1619] Got assigned task 'hps1000000020_5-1000000020_commservice-0.d95b7825-4313-11e8-85c9-024214286ecf' for framework 967c09ae-90ce-4dd1-ba80-2ad2da9fd545-0000
                        Apr 18 22:21:56 JQ-PZ-SER mesos-slave[15371]: W0418 22:21:56.543298 15407 slave.cpp:1793] Ignoring running task 'hps1000000020_5-1000000020_commservice-0.d95b7825-4313-11e8-85c9-024214286ecf' of framework 967c09ae-90ce-4dd1-ba80-2ad2da9fd545-0000 because the framework is terminating

    4.根据描述,发先试mesos1.3版本的一个bug

       https://issues.apache.org/jira/browse/MESOS-7215

    

    5.临时方案:重启mesos-salve暂时恢复;升级mesos到1.5以后版本

原文地址:https://www.cnblogs.com/brownyangyang/p/8991773.html