Redis-bio

reids bio.

1 什么是redis bio?

　　 redis bio.c 开篇注释讲的很清楚 redis bio 是 Background I/O service for Redis. 该服务通过创建不同线程来执行不同的任务，分摊主线程压力。

2 用来做什么?

　　redis 主线程用来处理命令，当碰到一些操作可能会阻塞主线程或给主线程带来很大压力时， redis 创建了 bio 来把这些操作挪到其他线程来做，从而尽量保证主线程的效率。

　　通过bio执行的操作目前包括

　　/* Background job opcodes */
　　#define BIO_CLOSE_FILE 0 /* Deferred close(2) syscall. */ 　　执行系统调用 close(2) ，当当前进程是一个文件的最后拥有者时，对改文件调用close 意味着 unlicking 这个文件，该操作不仅慢而且会阻塞主线程
　　#define BIO_AOF_FSYNC 1 /* Deferred AOF fsync. */ 　　　　往文件中写数据只是将数据写到了os的output buffer, 什么时候fsync 到硬盘一般由os决定，但是显示调用 fsync()可将os的output buffer中的数据刷到硬盘但是会引起主线程效率问题
　　#define BIO_LAZY_FREE 2 /* Deferred objects freeing. */ 　　 redis 执行 lazy_free 操作
　　#define BIO_NUM_OPS 3

3 如何做的?

　　当前版本实现(6.2) 中，redis创建了三个线程处理以上的三种类型任务，每种任务一个线程。

　　1 redis 抽象出了一个bio_job类型作为background job

　　struct bio_job {
　　　　time_t time; 　　　　　　/* Time at which the job was created. */
　　　　/* Job specific arguments.*/
　　　　int fd; 　　　　　　　　　/* Fd for file based background jobs */
　　　　lazy_free_fn *free_fn; 　　/* Function that will free the provided arguments */
　　　　void *free_args[]; 　　　　/* List of arguments to be passed to the free function */
　　};

　　2 前面三种类型操作对应三种不同操作，每种操作对应一个任务队列:

　　static list *bio_jobs[BIO_NUM_OPS];

　　3 当前的模型是创建三个线程分别处理三种类型任务，每个线程只处理固定一种任务，即每个线程只处理一个任务队列里的任务。

　　例如当前只有close类型的队列里有任务，那么只有处理close类型任务的那个线程在运行，另外两个线程都阻塞，等待对应的任务到来。

　　static pthread_t bio_threads[BIO_NUM_OPS];

　　4 每种线程的工作流程是，对应的任务队列为空时，阻塞；当对应的任务到来，追加到任务队列时，线程wakeup 开始执行任务，直到队列再次为空。

　　实际上这是最典型的生产者消费者模型，redis的实现也是典型的，用互斥锁和条件变量来实现这种运行机制.

　　static pthread_mutex_t bio_mutex[BIO_NUM_OPS];
　　static pthread_cond_t bio_newjob_cond[BIO_NUM_OPS];

　　每个任务类型对应一个互斥锁和条件变量,配合对应的函数来实现生产者消费者机制

　　5 以上就是redis bio服务的流程，概括起来就是，bio创建不同类型的任务线程来执行不同类型的任务，当有任务来到时线程wakeup执行，没有任务时线程block。

　　ps: 下面分析之前，强烈推荐看一遍这个文档:

　　https://docs.oracle.com/cd/E19455-01/806-5257/6je9h032r/index.html

4 具体代码分析:

　　1 准备任务线程：　　

void bioInit(void) {
    pthread_attr_t attr;
    pthread_t thread;
    size_t stacksize;
    int j;

    /* Initialization of state vars and objects */
    for (j = 0; j < BIO_NUM_OPS; j++) {
        pthread_mutex_init(&bio_mutex[j],NULL);
        pthread_cond_init(&bio_newjob_cond[j],NULL);
        pthread_cond_init(&bio_step_cond[j],NULL);
        bio_jobs[j] = listCreate();
        bio_pending[j] = 0;
    }

    /* Set the stack size as by default it may be small in some system */
    pthread_attr_init(&attr);
    pthread_attr_getstacksize(&attr,&stacksize);
    if (!stacksize) stacksize = 1; /* The world is full of Solaris Fixes */
    while (stacksize < REDIS_THREAD_STACK_SIZE) stacksize *= 2;
    pthread_attr_setstacksize(&attr, stacksize);

    /* Ready to spawn our threads. We use the single argument the thread
     * function accepts in order to pass the job ID the thread is
     * responsible of. */
    for (j = 0; j < BIO_NUM_OPS; j++) {
        void *arg = (void*)(unsigned long) j;
        if (pthread_create(&thread,&attr,bioProcessBackgroundJobs,arg) != 0) {
            serverLog(LL_WARNING,"Fatal: Can't initialize Background Jobs.");
            exit(1);
        }
        bio_threads[j] = thread;
    }
}

void *bioProcessBackgroundJobs(void *arg) {
    struct bio_job *job;
    unsigned long type = (unsigned long) arg;
    sigset_t sigset;

    /* Check that the type is within the right interval. */
    if (type >= BIO_NUM_OPS) {
        serverLog(LL_WARNING,
            "Warning: bio thread started with wrong type %lu",type);
        return NULL;
    }

    switch (type) {
    case BIO_CLOSE_FILE:
        redis_set_thread_title("bio_close_file");
        break;
    case BIO_AOF_FSYNC:
        redis_set_thread_title("bio_aof_fsync");
        break;
    case BIO_LAZY_FREE:
        redis_set_thread_title("bio_lazy_free");
        break;
    }

    redisSetCpuAffinity(server.bio_cpulist);

    makeThreadKillable();

    pthread_mutex_lock(&bio_mutex[type]);
    /* Block SIGALRM so we are sure that only the main thread will
     * receive the watchdog signal. */
    sigemptyset(&sigset);
    sigaddset(&sigset, SIGALRM);
    if (pthread_sigmask(SIG_BLOCK, &sigset, NULL))
        serverLog(LL_WARNING,
            "Warning: can't mask SIGALRM in bio.c thread: %s", strerror(errno));

    while(1) {
        listNode *ln;

        /* The loop always starts with the lock hold. */
        if (listLength(bio_jobs[type]) == 0) {
            pthread_cond_wait(&bio_newjob_cond[type],&bio_mutex[type]);
            continue;
        }
        /* Pop the job from the queue. */
        ln = listFirst(bio_jobs[type]);
        job = ln->value;
        /* It is now possible to unlock the background system as we know have
         * a stand alone job structure to process.*/
        pthread_mutex_unlock(&bio_mutex[type]);

        /* Process the job accordingly to its type. */
        if (type == BIO_CLOSE_FILE) {
            close(job->fd);
        } else if (type == BIO_AOF_FSYNC) {
            redis_fsync(job->fd);
        } else if (type == BIO_LAZY_FREE) {
            job->free_fn(job->free_args);
        } else {
            serverPanic("Wrong job type in bioProcessBackgroundJobs().");
        }
        zfree(job);

        /* Lock again before reiterating the loop, if there are no longer
         * jobs to process we'll block again in pthread_cond_wait(). */
        pthread_mutex_lock(&bio_mutex[type]);
        listDelNode(bio_jobs[type],ln);
        bio_pending[type]--;

        /* Unblock threads blocked on bioWaitStepOfType() if any. */
        pthread_cond_broadcast(&bio_step_cond[type]);
    }
}

　　由代码可以看到，先做了一下互斥锁和条件变量的初始化，然后创建了三个任务队列，每个类型任务各一个。然后创建了三个线程，每个线程的传入arg为各个任务的类型值，0,1,2。

　　各个线程创建之后执行 bioProcessBackgroundJobs 函数时，当前需要处理的任务队列为空因此阻塞在

　　pthread_cond_wait(&bio_newjob_cond[type],&bio_mutex[type]); 即队列为空时，每个线程都是阻塞状态，等待生产者生产任务。

　2 生产者生产任务：

void bioSubmitJob(int type, struct bio_job *job) {
    job->time = time(NULL);
    pthread_mutex_lock(&bio_mutex[type]);
    listAddNodeTail(bio_jobs[type],job);
    bio_pending[type]++;
    pthread_cond_signal(&bio_newjob_cond[type]);
    pthread_mutex_unlock(&bio_mutex[type]);
}

void bioCreateLazyFreeJob(lazy_free_fn free_fn, int arg_count, ...) {
    va_list valist;
    /* Allocate memory for the job structure and all required
     * arguments */
    struct bio_job *job = zmalloc(sizeof(*job) + sizeof(void *) * (arg_count));
    job->free_fn = free_fn;

    va_start(valist, arg_count);
    for (int i = 0; i < arg_count; i++) {
        job->free_args[i] = va_arg(valist, void *);
    }
    va_end(valist);
    bioSubmitJob(BIO_LAZY_FREE, job);
}

void bioCreateCloseJob(int fd) {
    struct bio_job *job = zmalloc(sizeof(*job));
    job->fd = fd;

    bioSubmitJob(BIO_CLOSE_FILE, job);
}

void bioCreateFsyncJob(int fd) {
    struct bio_job *job = zmalloc(sizeof(*job));
    job->fd = fd;

    bioSubmitJob(BIO_AOF_FSYNC, job);
}

　　有代码可以看到，生产者调用 bioCreateLazyFreeJob，bioCreateCloseJob，bioCreateFsyncJob，创建BIO_LAZY_FREE, BIO_CLOSE_FILE, BIO_AOF_FSYNC 类型的任务，供消费者消费。

　　目前这些函数是由主线程调用，即主线程是生产者。主线程在适当时候生产任务，各个任务线程在 bioProcessBackgroundJobs 中消费任务

5 任务线程有任务时执行任务，无任务时阻塞的具体实现解析

　　　1 前面提到任务线程利用互斥锁和条件变量实现，有任务时执行任务，无任务时阻塞。用到的库函数调用有

　　　　pthread_mutex_lock(&bio_mutex[type]);

　　　　pthread_mutex_unlock(&bio_mutex[type]);

　　　　pthread_cond_wait(&bio_newjob_cond[type],&bio_mutex[type]);

　　　　pthread_cond_signal(&bio_newjob_cond[type]);

　　　　先看下以上函数是如何生效的

　　　　mutex 相关: https://www.geeksforgeeks.org/mutex-lock-for-linux-thread-synchronization/

　　　　条件变量相关:

　　　　1) 刚创建线程时线程在 bioProcessBackgroundJobs 中首先先获得锁，然后在 while 循环中调用 pthread_cond_wait(&bio_newjob_cond[type],&bio_mutex[type]);

　　　　　　这时 1线程自动释放之前获得的互斥锁，2导致调用线程阻塞在对应的条件变量上(spin 轮询检查条件变量状态)，等待pthread_cond_signal()或pthread_cond_broadcast()调用，函数返回，唤醒线程。

　　　　2) 当主线程创建任务时，最终调用 bioSubmitJob 如下：

void bioSubmitJob(int type, struct bio_job *job) {
    job->time = time(NULL);
    pthread_mutex_lock(&bio_mutex[type]); 
    listAddNodeTail(bio_jobs[type],job);
    bio_pending[type]++;
    pthread_cond_signal(&bio_newjob_cond[type]);
    pthread_mutex_unlock(&bio_mutex[type]);
}

　　　　该函数是一个标准的进行 pthread_cond_signal 的例子，先获得任务线程调用 pthread_cond_wait 时释放的锁，然后向任务队列追加任务然后执行 pthread_cond_signal 该函数唤醒

　　　　阻塞在pthread_cond_wait 的一个线程也就是该任务的任务线程，当执行完pthread_cond_signal 后, pthread_cond_wait 函数中轮询检查条件变量时为true，从而使得工作线程跳出轮询，

　　　　但是pthread_cond_wait 在线程跳出轮询后，在该函数返回前，又会调用__pthread_mutex_cond_lock 来获得锁，直到该线程获得锁以后，pthread_cond_wait 才会真正的返回。此时如果主线程在调用完

　　　　pthread_cond_signal 后，在任务线程 pthread_cond_wait函数中调用 __pthread_mutex_cond_lock 之前，主线程执行完 pthread_mutex_unlock，则任务线程在 __pthread_mutex_cond_lock 时，直接获得锁，

　　　　否则任务线程会阻塞在__pthread_mutex_cond_lock ，等到主线程调用完 pthread_mutex_unlock 后，任务线程获得锁，才真正从pthread_cond_wait 返回然后开始执行之后逻辑。

　　　　以上步骤就是线程利用互斥锁和条件变量实现生产者消费者模型的流程。

6 利用互斥锁和条件变量实现生产者消费者模型的流程中一些注意的细节。

　　1）消费者代码不变，生产者调用的 bioSubmitJob 函数中 pthread_cond_signal 和 pthread_mutex_unlock 的调用顺序互换会产生什么问题

pthread_cond_signal(3THR)
Use pthread_cond_signal(3THR) to unblock one thread that is blocked on the condition variable pointed to by cv. (For Solaris threads, see "cond_signal(3THR)".)

Prototype:
int    pthread_cond_signal(pthread_cond_t *cv);
#include <pthread.h>

pthread_cond_t cv;
int ret;

/* one condition variable is signaled */
ret = pthread_cond_signal(&cv); 
Call pthread_cond_signal() under the protection of the same mutex used with the condition variable being signaled. Otherwise, the condition variable could be signaled between the test of the associated condition and blocking in pthread_cond_wait(), which can cause an infinite wait.

The scheduling policy determines the order in which blocked threads are awakened. For SCHED_OTHER, threads are awakened in priority order.

When no threads are blocked on the condition variable, calling pthread_cond_signal() has no effect.

Return Values
pthread_cond_signal() returns zero after completing successfully. Any other returned value indicates that an error occurred. When the following condition occurs, the function fails and returns the corresponding value.

　　下边黑体字说明了互换以后产生的问题.目前我还不能判断是pthread_cond_wait 函数内部还是外部逻辑代码引起的无限等待.

　　2) 调用pthread_cond_wait 的姿势:

Because the condition can change before an awakened thread returns from pthread_cond_wait(), the condition that caused the wait must be retested before the mutex lock is acquired. The recommended test method is to write the condition check as a while() loop that calls pthread_cond_wait().

　　因此在 pthread_cond_wait 调用完成之后，要重新对while中使程序进入pthread_cond_wait 的条件进行检查，本例是

　　while (1) {
　　　　...

　　　　if (listLength(bio_jobs[type]) == 0) {
           pthread_cond_wait(&bio_newjob_cond[type],&bio_mutex[type]);
           continue;
     　　}

　　　　...
　　}

　　注意 pthread_cond_wait 的调用姿势。

　　看完 redis bio 之后的一些理解，不对地方望留言指正。