mongodb"failed to create thread after accepting new connection, closing connection"问题定位

事件

lxc宿主机10.11.164.28上所有mongodb数据节点,在同一时刻报错:failed to create thread after accepting new connection, closing connection

宿主机版本:oracle linux 6.5,lxc版本:1.0.6

数据库版本:mongodb 3.2.11

报错信息:

2017-07-26T10:23:18.734+0800 I NETWORK  [initandlisten] failed to create thread after accepting new connection, closing connection
2017-07-26T10:23:23.781+0800 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:44334 #5874 (14 connections now open)
2017-07-26T10:23:23.781+0800 I NETWORK  [initandlisten] pthread_create failed: errno:11 Resource temporarily unavailable
2017-07-26T10:23:23.781+0800 I NETWORK  [initandlisten] failed to create thread after accepting new connection, closing connection
2017-07-26T10:23:26.670+0800 I NETWORK  [initandlisten] connection accepted from 192.168.4.206:54601 #5875 (14 connections now open)
2017-07-26T10:23:26.670+0800 I NETWORK  [initandlisten] pthread_create failed: errno:11 Resource temporarily unavailable

问题确认

报错信息所在文件:

./mongodb-src-r3.2.16/src/mongo/util/net//message_server_port.cpp:            log() << "failed to create thread after accepting new connection, closing connection";

./mongodb-src-r3.2.16/src/mongo/util/net//message_server_port.cpp:            log() << "pthread_create failed: " << errnoWithDescription(failed) << endl;

    virtual void accepted(std::shared_ptr<Socket> psocket, long long connectionId) {
        ScopeGuard sleepAfterClosingPort = MakeGuard(sleepmillis, 2);
        std::unique_ptr<MessagingPortWithHandler> portWithHandler(
            new MessagingPortWithHandler(psocket, _handler, connectionId));
        if (!Listener::globalTicketHolder.tryAcquire()) {
            log() << "connection refused because too many open connections: "
                  << Listener::globalTicketHolder.used() << endl;
            return;
        }
        try {
#ifndef __linux__  // TODO: consider making this ifdef _WIN32
            {
                stdx::thread thr(stdx::bind(&handleIncomingMsg, portWithHandler.get()));
                thr.detach();
            }
#else
            pthread_attr_t attrs;                                                                      //声明pthread_attr_t对象attrs
            pthread_attr_init(&attrs);                                                                 //初始化attrs
            pthread_attr_setdetachstate(&attrs, PTHREAD_CREATE_DETACHED);                              //设置线程attrs状态为PTHREAD_CREATE_DETACHED,退出时自行释放所占用的资源
            static const size_t STACK_SIZE =                                                           //设置静态常量stack_size,数据类型为size_t,正整数
                1024 * 1024;  // if we change this we need to update the warning
            struct rlimit limits;                                                                      //声明rlimit类型的结构体limits,详细内容在下文解释
            verify(getrlimit(RLIMIT_STACK, &limits) == 0);                                             //验证,RLIMIT_STACK(最大的进程堆栈)和limits比较,如果
            if (limits.rlim_cur > STACK_SIZE) {                                                        //如果需要的stack大小大于建议设置的stack大小比较,则分配建议的stack_size(1M)
                size_t stackSizeToSet = STACK_SIZE;
#if !__has_feature(address_sanitizer)
                if (kDebugBuild)                                                                       //
                    stackSizeToSet /= 2;
#endif
                pthread_attr_setstacksize(&attrs, stackSizeToSet);                                     //为线程attrs分配堆栈大小,大小为stackSizeToSet
            } else if (limits.rlim_cur < 1024 * 1024) {                                                //如果需要的limit值小于1M,则warning
                warning() << "Stack size set to " << (limits.rlim_cur / 1024)
                          << "KB. We suggest 1MB" << endl;
            }
 
            pthread_t thread;                                                                          //声明进程
            int failed = pthread_create(&thread, &attrs, &handleIncomingMsg, portWithHandler.get());   //创建进程(进程号,属性,其实函数地址等,启动变量等),fail值,成功为0,失败-1
            pthread_attr_destroy(&attrs);                                                              //释放占用的sttrs资源
            if (failed) {                                                                              //创建失败,日志打印,
                log() << "pthread_create failed: " << errnoWithDescription(failed) << endl;            //errnoWithDescription(failed)在这里为Resource temporarily unavailable
                throw std::system_error(
                    std::make_error_code(std::errc::resource_unavailable_try_again));
            }
#endif  // __linux__
            portWithHandler.release();                                                                //释放定制的函数指针
            sleepAfterClosingPort.Dismiss();                                                          //
        } catch (...) {                                                                               //抛出异常,释放监听进程等
            Listener::globalTicketHolder.release();
            log() << "failed to create thread after accepting new connection, closing connection";
        }
    }

其中limits是rlimit类型的结构体,定义如下,rlimit是linux系统的结构体,定义一个进程在运行过程中能得到的最大进程,针对soft limit(软限制)或者hard limit(硬限制)

struct rlimit {
rlim_t rlim_cur;                       //soft limit
rlim_t rlim_max;                       //hard limit
};

有两种函数控制:

int getrlimit(int resource, struct rlimit *rlim);                                 //查询进程是否满足一个进程的rlimit
int setrlimit(int resource, const struct rlimit *rlim);

报错含义是:不能创建新的进程,来映射新链接

确认原因

lxc虚机与stack相关的参数信息如下:

ulimit -s    
stack size                        The maximum stack size
 
cat /etc/security/limits.conf
mongodb           soft    nproc   4096            max number of processes
mongodb           hard    nproc   16384           max number of processes
mongodb           soft    nofile  131072          max number of open file descriptors
mongodb           hard    nofile  131072          max number of open file descriptors
mongodb           soft    stack   1024            max stack size (KB)
mongodb           hard    stack   1024            max stack size (KB)

宿主机与stack相关的参数表

stack size              (kbytes, -s) 8192
 
# End of file
*           soft    nproc   65536
*           hard    nproc   65536
*           soft    nofile  131072
*           hard    nofile  131072

尝试修改虚机的nproc限制,并不起作用。

除此,在linux 6x版本中,还引入了一个配置cat /etc/security/limits.d/90-nproc.conf

宿主机的配置:

# cat /etc/security/limits.d/90-nproc.conf
# Default limit for number of user's processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.
*          soft    nproc     65536
root       soft    nproc     unlimited

虚机的配置:

#/etc/security/limits.d/90-nproc.conf
# Default limit for number of user's processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.
*          soft    nproc     1024
root       soft    nproc     unlimited

尝试扩大虚机对*用户的nproc软限制,改为10240,mongodb创建链接恢复正常

结论,创建thread受限两个配置,/etc/security/limit.conf和/etc/security/limits.d/90-nproc.conf,当然也受限与宿主机的配置。

原文地址:https://www.cnblogs.com/wyett/p/7458651.html