wait/waitpid函数与僵尸进程、fork 2 times

一、僵尸进程

当子进程退出的时候，内核会向父进程发送SIGCHLD信号，子进程的退出是个异步事件（子进程可以在父进程运行的任何时刻终止）
子进程退出时，内核将子进程置为僵尸状态，这个进程称为僵尸进程，它只保留最小的一些内核数据结构，以便父进程查询子进程的退出状态。

A child that terminates, but has not been waited for becomes a "zombie". The kernel maintains a

minimal set of information about the zombie process (PID, termination status, resource usage

information) in order to allow the parent to later perform a wait to obtain information about the

child. As long as a zombie is not removed from the system via a wait, it will consume a slot in

the kernel process table, and if this table fills, it will not be possible to create further

processes. If a parent process terminates, then its "zombie" children (if any) are adopted by

init(8), which automatically performs a wait to remove the zombies.

父进程查询子进程的退出状态可以用wait/waitpid函数。

二、如何避免僵尸进程

当一个子进程结束运行时，它与其父进程之间的关联还会保持到父进程也正常地结束运行或者父进程调用了wait/waitpid才告终止。
进程表中代表子进程的数据项是不会立刻释放的，虽然不再活跃了，可子进程还停留在系统里，因为它的退出码还需要保存起来以备父进程中后续的wait/waitpid调用使用。它将称为一个“僵进程”。

调用wait或者waitpid函数查询子进程退出状态，此方法父进程会被挂起（waitpid可以设置不挂起）。
如果不想让父进程挂起，可以在父进程中加入一条语句：signal(SIGCHLD,SIG_IGN);表示父进程忽略SIGCHLD信号，该信号是子进程退出的时候向父进程发送的。也可以不忽略SIGCHLD信号，而接收在信号处理函数中调用wait/waitpid。

// 让子进程退出后自动回收，避免成为僵尸或者需要父进程 wait。
struct sigaction sat_cld = { .sa_handler = SIG_IGN, .sa_flags = SA_NOCLDWAIT };
sigaction(SIGCHLD, &sat_cld, NULL);

而在运维中常用的手段是杀死父进程，这样子进程会由init 进程接管，由它来清理子进程的状态。

三、wait函数

头文件<sys/types.h>和<sys/wait.h>
函数功能:当我们用fork启动一个进程时，子进程就有了自己的生命，并将独立地运行。有时，我们需要知道某个子进程是否已经结束了，我们可以通过wait安排父进程在子进程结束之后。
函数原型
pid_t wait(int *status)
函数参数
status:该参数可以获得你等待子进程的信息
返回值：
成功等待子进程, 函数返回等待子进程的ID

wait系统调用会使父进程暂停执行，直到它的一个子进程结束为止。
返回的是子进程的PID，它通常是结束的子进程
状态信息允许父进程判定子进程的退出状态，即从子进程的main函数返回的值或子进程中exit语句的退出码。
如果status不是一个空指针，状态信息将被写入它指向的位置

通过以下的宏定义可以获得子进程的退出状态

WIFEXITED(status) 如果子进程正常结束，返回一个非零值
WEXITSTATUS(status) 如果WIFEXITED非零，返回子进程退出码
WIFSIGNALED(status) 子进程因为捕获信号而终止，返回非零值
WTERMSIG(status) 如果WIFSIGNALED非零，返回信号代码
WIFSTOPPED(status) 如果子进程被暂停，返回一个非零值
WSTOPSIG(status) 如果WIFSTOPPED非零，返回一个信号代码

四、waitpid函数

函数功能:用来等待某个特定进程的结束

函数原型:
pid_t waitpid(pid_t pid, int *status,int options)
参数:
status:如果不是空，会把状态信息写到它指向的位置
options：允许改变waitpid的行为，最有用的一个选项是WNOHANG,它的作用是防止waitpid把调用者的执行挂起等待(return immediately if no child has exited.)
返回值：如果成功, 返回等待子进程的ID，失败返回-1

对于waitpid的p i d参数的解释与其值有关：
pid == -1 等待任一子进程。于是在这一功能方面waitpid与wait等效。
pid > 0 等待其进程I D与p i d相等的子进程。
pid == 0 等待其组I D等于调用进程的组I D的任一子进程。换句话说是与调用者进程同在一个组的进程。
pid < -1 等待其组I D等于p i d的绝对值的任一子进程。

五、wait和waitpid函数的区别

两个函数都用于等待进程的状态变化，包括正常退出，被信号异常终止，被信号暂停，被信号唤醒继续执行等。

在一个子进程终止前， wait 使其调用者阻塞，而waitpid 有一选择项，可使调用者不阻塞。
waitpid并不只能等待第一个终止的子进程—它有若干个选择项，可以控制它所等待的特定进程。
实际上wait函数是waitpid函数的一个特例。

RETURN VALUE

wait(): on success, returns the process ID of the terminated child; on error, -1 is returned.

waitpid(): on success, returns the process ID of the child whose state has changed; if WNOHANG was specified and one or more
child(ren) specified by pid exist, but have not yet changed state, then 0 is returned. On error, -1 is returned.

示例程序：

 C++ Code 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55

/*************************************************************************
    > File Name: process_.c
    > Author: Simba
    > Mail: dameng34@163.com
    > Created Time: Sat 23 Feb 2013 02:34:02 PM CST
************************************************************************/
#include<sys/types.h>
#include<sys/stat.h>
#include<unistd.h>
#include<fcntl.h>
#include<stdio.h>
#include<stdlib.h>
#include<errno.h>
#include<string.h>
#include<sys/wait.h>

#define ERR_EXIT(m)
    do {
        perror(m);
        exit(EXIT_FAILURE);
    } while(0)

int main(int argc, char *argv[])
{
    pid_t pid;
    pid = fork();
    if (pid == -1)
        ERR_EXIT("fork error");

    if (pid == 0)
    {
        sleep(3);
        printf("this is child ");
        //      exit(100);
        abort();
    }

    printf("this is parent ");
    int status;
    int ret;
    ret = wait(&status); // 阻塞等待子进程退出
    //  ret = waitpid(-1, &status, 0);
    //  ret = waitpid(pid, &status, 0);
    /* waitpid可以等待特定的进程，而不仅仅是第一个退出的子进程
     * 且可以设置option为WNOHANG,即不阻塞等待 */
    printf("ret=%d, pid=%d ", ret, pid);
    if (WIFEXITED(status))
        printf("child exited normal exit status=%d ", WEXITSTATUS(status));
    else if (WIFSIGNALED(status))
        printf("child exited abnormal signal number=%d ", WTERMSIG(status));
    else if (WIFSTOPPED(status))
        printf("child stopped signal number=%d ", WSTOPSIG(status));

    return 0;
}

输出为：

simba@ubuntu:~/Documents/code/linux_programming/APUE/process$ ./wait
this is parent
this is child
ret=7156, pid=7156
child exited abnormal signal number=6

说明子进程被信号异常终止，因为我们调用了abort(), 即产生SIGABRT信号将子进程终止，可以查到此信号序号为6。如果我们不使用abort 而是exit(100), 则应该输出 child exited normal exit status=100 ，即正常退出。

也就是所谓两次 fork 调用，主进程并不直接创建目标子进程，而是通过创建一个 Son，然后再由Son 创建实际的目标子进程 Grandson。Son 在创建

Grandson 后立即返回，并由主进程 waitpid回收掉。而真正的目标 Grandson 则因为 "生父" Son 死掉而被 init 收养，然后直接被人道毁灭。

 C++ Code 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

void create_child()
{
    pid_t son = fork();
    if (son == 0)
    {
        pid_t grandson = fork();
        if (grandson == 0)
        {
            printf("child: %d, parent: %d ", getpid(), getppid());
            exit(EXIT_SUCCESS);
        }
        exit(EXIT_SUCCESS);
    }
    else if (son > 0)
    {
        waitpid(son, NULL, 0);
    }
    else
    {
        perror("fork");
    }
}

int main(int argc, char *argv[])
{
    for (int i = 0; i < 10; i++)
    {
        create_child();
    }
    while(true) pause();
    return EXIT_SUCCESS;
}

参考：《APUE》