Linux 系统编程学习笔记

线程的概念

线程是操作系统能够进行运算调度的最小单位，包含在进程中，是进程实际运作单位。

线程共享资源：

文件描述符
每种信号的处理方式（SIG_IGN/SIG_DFL or 自定义信号处理函数）
当前工作目录
用户id和组id

每个线程各有一份的资源：

线程id
上下文，包括各种寄存器的值/PC（程序计数器）和栈指针
栈空间
errno变量
信号屏蔽字Signal Mask
调度优先级

线程 vs 进程
线程是CPU最小的调度的那位，进程是最小的资源分配单位；
一个进程可以包含多个线程，一个线程只能属于一个进程；
同一进程下的多个线程共享同一地址空间，不同进程无法共享直接数据；

线程库由POSIX标准定义，称为POSIX thread或pthread。
Linux上线程函数位于libpthread共享库，编译时要加上-lpthread选项。

线程控制

线程创建

#include <pthread.h>

// 创建新线程，当前线程返回后继续执行
// 成功返回0，失败返回-1，错误保存在errno中
int pthread_create(pthread_t *restrict thread, const pthread_attr_t *restrict attr, void *(*start_routine)(void *), void *restrict arg);

当前线程返回后继续执行，新线程执行由函数指针start_routine决定。

线程参数
start_routine函数接收一个参数，通过arg传递，类型是void* ，含义由调用者自己定义。
线程返回值
start_routine返回时，新建线程退出。其他线程可以通过调用pthread_join得到start_routine返回值，类似于父进程调用wait得到子进程退出状态。
线程id
新建线程id被填写到thread参数所指向的内存单元。
进程id类型pid_t，每个进程id在系统中是唯一的，调用getpid可以得到进程id，是一个正整数。
线程id类型是thread_t，只在当前进程中是唯一的，不同系统中thread_t有不同实现，可能是一个整数，可能是一个结构体，也可能是一个地址，因此不能简单调用printf打印，而要调用pthread_self获得当前线程id。

注：调用创建线程的线程，通过thread参数得到的线程id，与新建线程内调用pthread_self得到的线程id是意义的，因为同一进程中线程id是唯一的。

线程属性
arr表示线程属性，这里的例子都用NULL传给arr参数，表示取缺省值。

例，使用线程简单示例

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include unistd.h>

pthread_t ntid;

void printids(const char *s) {
  pid_t pid;
  pthread_t tid;

  pid = getpid(); // 获取当前进程id
  tid = pthread_self(); // 获取当前线程id
  printf("%s pid %u tid %u (0x%x)
", s, (unsigned int)pid, (unsigned int )tid, (unsigned int)tid);
}

void *thr_fn(void *arg) {
  printids(arg);
  return NULL;
}

int main() {
  int err;
  err = pthread_create(&ntid, NULL, thr_fn, "new thread: ");
  if (err != 0) {
    fprintf(stderr, "can't create thread: %s
", strerror(err)); 
    exit(1);
  }

  printids("main thread:");
  sleep(1);
  return 0;
}

编译运行结果：

$ gcc main.c -lpthread
$ ./a.out
main thread: pid 7398 tid 3084450496 (0xb7d8fac0)
new thread:  pid 7398 tid 3084446608 (0xb7d8eb90)

结果分析：main所在线程，和新建线程同属于一个进程，进程id一样，而线程id不一样。由于pthread_create错误码不保存在errno中，因此不能直接用perror打印错误信息，而需要调用strerror把错误码转换成错误信息再打印。
如果任意一个线程调用exit或_exit，整个进程的所有线程都终止。从main函数return也相当于exit，为防止新建线程还没执行就终止，所以在main return之前延时1秒。

终止线程

上面提到终止进程，线程也会终止。有没有什么办法可以终止线程，但不终止进程？
有三种方法：

从线程函数return。该方法对主线程不适用，因为从main return相当于调用exit；
一个线程可以调用pthread_cancel终止同一进程中的另一个进程；
线程可以调用pthread_exit终止自己；

用pthread_cancel终止一个线程分同步和异步两种情况。

pthread_exit和pthread_join

#include <pthread.h>
void pthread_exit(void *value_ptr);

pthread_exit或return返回的指针所指向的内存单元必须是全局或者malloc分配的，不能在线程函数的栈上分配。

#include <pthread.h>

int pthread_join(pthread_ thread, void**value_ptr);

调用pthread_join 的线程将挂起等待，直到id为thread的线程终止。thread线程以不同的方法终止，通过pthread_join得到的终止状态是不同的：

如果thread线程通过return 返回，value_ptr所指向的单元里存放的是thread线程函数的返回值；
如果thread线程被别的线程调用pthread_cancel异常终止掉，value_ptr所指向的单元里存放的是常数PTHREAD_CANCELED(值为-1)；
如果thread线程是自己调用pthread_exit终止的，value_ptr所指向的单元存放的是pthread_exit退出码；
如果对thread线程的终止状态不感兴趣，可以传NULL给value_ptr参数。

例，用三种方式终止线程，并获取返回值

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>

void *thr_fn1(void *arg) {
  printf("thread 1 returning
");
  return (void *)1;
}

void *thr_fn2(void *arg) {
  printf("thread 2 exiting
");
  pthread_exit((void *)2);
}

void *thr_fn3(void *arg) {
  while(1) {
    printf("thread 3 writing
");
    sleep(1);
  }  
}

int main() {
  pthread_t tid;
  void *tret;

  pthread_create(&tid, NULL, thr_fn1, NULL);
  pthread_join(tid, &tret); // 挂起main线程，等待线程tid结束，tret包含了线程返回信息
  printf("thread 1 exit code %d
", (int)tret);

  pthread_create(&tid, NULL, thr_fn2, NULL);
  pthread_join(tid, &tret);
  printf("thread 2 exit code %d
", (int)tret);

  pthread_create(&tid, NULL, thr_fn3, NULL);
  sleep(3);
  pthread_cancel(tid);
  pthread_join(tid, &tret);
  printf("thread 3 exit code %d
", (int)tret);

  return 0;
}

运行结果：

thread 1 is returning
thread 1 exit code 1
thread 2 is existing
thread 2 exit code 2
thread 3 is writing
thread 3 is writing
thread 3 exit code -1

最后返回-1，其实是PTHREAD_CANCELED的值。别的线程调用pthread_cancel终止当前线程，被终止线程会返回PTHREAD_CANCELED

#include <pthread.h>

#define PTHREAD_CANCELED ((void *) -1)

线程终止后，其终止状态一直保留到其他线程调用pthread_join获取它的状态为止。
线程也可以被置为detach状态，这样的线程一旦终止就立刻回收它占用的所有资源，而不保留终止状态。
不能对一个已经处于detach状态的线程调用pthread_join，调用将返回EINVAL（errno.h）；对尚未detach的线程调用pthread_join或phtread_detach可以把该线程置为detach状态。

注意：不能对同一线程调用2次pthread_join，也不能对同一线程同时调用pthread_detach和pthread_join。

#include <pthread.h>
int pthread_detach(pthread_t tid);

线程间同步

mutex 互斥量

多个线程同时访问共享数据可能产生冲突。
比如，一个变量自增1，需要3条指令：

从内存读取变量值到寄存器；
寄存器值+1；
将寄存器值写回内存；

假如2个线程在多处理器平台上同时执行这三条指令，可能导致下图结果，最后变量只加了1次而非2次。

思考：单处理器平台执行，会出现这样的问题吗？
解析：也可能会。在第一个线程从内存取值到寄存器并+1后，写回内存之前，另外一个线程中断了当前线程的执行，将值+1后写回内存，然后回到第一个线程将值写回。这样就还是只加了1次而非2次。

例子，创建2个线程各自对counter +1进行5000次

#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>

#define NLOOP 5000

int counter ; // 全局变量默认值0

void *doit(void *arg) {
    int val;

    for (int i = 0; i < NLOOP; ++i) {
        val = counter;
        printf("%x: %d
", (unsigned int)pthread_self(), val + 1);
        counter = val + 1;
    }

    return NULL;
}

// 多线程访问冲突问题
int main() {
    pthread_t thrA, thrB;

    pthread_create(&thrA, NULL, &doit, NULL);
    pthread_create(&thrB, NULL, &doit, NULL);

    // 等待线程结束
    pthread_join(thrA, NULL);
    pthread_join(thrB, NULL);
//    sleep(10);

    return 0;
}

正常情况下，counter应该等于10000，但实际运行5000，也有可能不等于5000，可以尝试运行多条线程。这是因为多线程程序，存在访问冲突的问题。解决办法就是使用互斥锁（Mutex，Mutual Exclusive Lock）。获得锁的线程可以完成“读/写/修改”操作，然后释放给其他线程，没有获得锁的线程只能等待而不能访问共享数据，这样“读/写/修改”就是原子操作，无法被打断。

Mutex
Mutex用pthread_mutex_t类型变量表示，初始化和销毁方式

#include <pthread.h>

// 销毁pthread_mutex_t
// 成功返回0，出错返回错误号
// 适用于销毁phtread_mutex初始化的mutex
int pthread_mutex_destroy(pthread_mutex_t *mutex);
// 初始化Mutex
// attr 设定Mutex属性，NULL表示使用缺省属性
// 适用于在代码块内对mutex进行初始化
int pthread_mutex_init(pthread_mutext_t *restrict mutex, const pthread_mutexattr_t *attr);
// 初始化Mutex
// 适用于全局变量或者static变量
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; // <=> pthread_mutex_init(&mutex, NULL);

Mutex的加锁/解锁操作

#include <pthread.h>
// 挂起等待锁资源Mutex，直到另一个线程unlock锁资源
int pthread_mutex_lock(pthread_mutex_t *mutex);
// 尝试获得锁资源Mutex，当前线程不会挂起，如果锁已被占用，返回EBUSY
int pthread_mutex_trylock(pthread_mutex_t *mutex);
// 释放Mutex资源
int pthread_mutex_unlock(pthread_mutex_t *mutex);

线程如果通过lock获得锁资源，就会执行锁后面的代码；如果锁资源已经被获取，线程就会挂起等待另一个线程调用unlock释放资源。
要确保原子性的代码运行结束后，通过unlock释放锁资源。

死锁的两种典型情况：

如果一个线程连续调用2次lock，第1个lock已经获得了锁资源，第2个lock由于锁被占用会挂起等待别的线程unlock，而占用该锁资源的正是自己，这样就形成死锁（Deadlock），线程就永远挂起等待了。
如果线程A获得锁1，等待锁2，线程B获得锁2，等待锁1，就形成死锁；

写程序应尽量避免同时使用多个锁，如果要这么做，有一个原则：
所有线程都按相同的现后顺序获得锁，如一个程序用到锁1、锁2、锁3，那么所有线程需要获得2个或3个锁时，都应该按锁1、锁2、锁3的顺序获得；
如果要为锁确定顺序很困难，应尽量使用pthread_mutex_trylock代替pthread_mutex_lock，以避免死锁。

将上面的例子，用Mutex进行改造

#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#define NLOOP 5000

int counter ;
pthread_mutex_t counter_mutex = PTHREAD_MUTEX_INITIALIZER; // 初始化全局锁

void *doit(void *arg) {
    int val;

    for (int i = 0; i < NLOOP; ++i) {
        pthread_mutex_lock(&counter_mutex);     // 挂起等待锁资源
        val = counter;
        printf("%x: %d
", (unsigned int)pthread_self(), val + 1);
        counter = val + 1;
        pthread_mutex_unlock(&counter_mutex);   // 释放锁资源
    }

    return NULL;
}

// 多线程访问冲突问题
int main() {
    pthread_t thrA, thrB, thrC;

    pthread_create(&thrA, NULL, &doit, NULL);
    pthread_create(&thrB, NULL, &doit, NULL);

    // 等待线程结束
    pthread_join(thrA, NULL);
    pthread_join(thrB, NULL);
//    sleep(10);

    return 0;
}

Condition Variable 条件变量

线程间同步有这样一种情况：线程A需要等待某个条件成立，才能继续往下执行，这个条件不成立，线程A就阻塞等待，而线程B在执行过程中使这个条件成立了，就唤醒线程A继续执行。
pthread库使用条件变量（Condition Variable）来阻塞等待一个条件，或者唤醒等待条件的线程。
条件变量是pthread_cond_t类型的，初始化和销毁方式（类似于Mutex的初始化和销毁）：

#include <pthread.h>

int pthread_cond_destroy(pthread_cond_t *cond);
int pthread_cond_init(pthread_cond_t *restrict cond,  const pthread_condattr_t *restrict attr);
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;  // <=> pthread_cond_init(&cond, NULL);

Condition Variable操作方式

#include <pthread.h>
// 阻塞等待条件满足，可以设置超时时间。超时时，会自动退出阻塞等待状态
int pthread_cond_timedwait(pthread_cond_t *restrict cond, pthread_mutex_t *restrict mutex, const struct timespec *restrict abstime);
// 阻塞等待条件满足
int pthread_cond_wait(pthread_cond_t *restrict cond, pthread_mutex_t *resctrict mutex);
// 唤醒在cond指向条件变量上等待的所有线程，让他们重新竞争锁资源
int pthread_cond_broadcast(pthread_cond_t *cond);
// 唤醒在cond指向的条件变量上等待的一个线程
int pthread_cond_signal(pthread_cond_t *cond);

一个Condition Variable总是和一个Mutex搭配使用：一个线程可以调用pthread_cond_wait阻塞等待某个条件，该函数主要完成三件事：

释放Mutex，这也是为什么条件变量需要传入Mutex（互斥锁）；
阻塞当前线程，等待条件满足；
被唤醒时，重新获得Mutex并返回，需要别的线程来唤醒；

#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

struct msg {
    struct msg *next;
    int num;
};

struct msg *head;
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t has_product = PTHREAD_COND_INITIALIZER;

void *producer(void *arg) {
    struct msg *mp;
    sleep(2);
    // 新建的链表节点，插入头部，用head指向
    for ( ; ; ) {
        mp = malloc(sizeof (struct msg));
        // rand() 生成0~RAND_MAX之间的伪随机数
        mp->num = rand() % 1000 + 1; // 生成随机数1~1000之间的随机数
        printf("Produce %d
", mp->num);

        pthread_mutex_lock(&lock);
        mp->next = head;
        head = mp;
        pthread_mutex_unlock(&lock);

        pthread_cond_signal(&has_product);

        sleep(rand() % 5); // 随机挂起当前线程 0~4秒
    }

}

void *consumer(void *arg) {
    struct msg *mp;

    for ( ; ; ) {
        pthread_mutex_lock(&lock);

        if (head == NULL)
            pthread_cond_wait(&has_product, &lock); // 线程阻塞等待，主动放弃mutex资源，等到唤醒时再次获取mutex

        mp = head;
        head = mp->next;
        pthread_mutex_unlock(&lock);

        printf("Consume %d
", mp->num);
        free(mp);
        sleep(rand() % 5); // 随机挂起当前线程 0~4秒
    }
}

int main() {
    pthread_t thrA, thrB;
    srand(time(NULL));

    pthread_create(&thrA, NULL, producer, NULL);
    pthread_create(&thrB, NULL, consumer, NULL);

    pthread_join(thrA, NULL);
    pthread_join(thrB, NULL);

    return 0;
}

Semaphore 信号量

Mutex变量非0即1，看看作资源数为1的可用数量。初始时，资源数为1；加锁时，资源数减为0；释放锁时，资源数增加为1。
信号量Semaphore 和Mutex类似，表示资源可用数量，不同的是该数量可 > 1。

POSIX semaphore库函数（见sem_overview），可用于同一进程不同线程间同步，而且还可以用于不同进程间同步。

#include <semaphore.h>

// 初始化一个semaphore变量，value表示资源可用数量，pshared参数为0表示信号量用于同一进程的线程间同步
int sem_init(sem_t *sem, int pshared, unsigned int value);
// 阻塞等待，可使资源可用数目-1
int sem_wait(sem_t *sem);
// 阻塞等待，但可设置超时时间
int sem_trywait(sem_t *sem);
// 释放资源，使资源可用数目+1
int sem_post(sem_t *sem);
// 释放与semaphore相关资源
int sem_destroy(sem_t *sem);

semaphore变量类型为sem_t；

将上面的生产者-消费者示例，由Mutex锁 + 条件变量 + 链表实现，改成信号量Semaphore + 环形队列实现：

#include <semaphore.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

#define NUM    5

sem_t blank_num, product_num;
int queue[NUM];

void *produce(void *arg) {
    int p = 0;
    while (true) {
        sem_wait(&blank_num);
        queue[p] = rand() % 1000 + 1;
        printf("Produce %d
", queue[p]);
        sem_post(&product_num);
        p = (p + 1) % NUM;
        sleep(rand() % 5);
    }
}

void *consume(void *arg) {
    int c = 0;
    int temp;

    while (true) {
        sem_wait(&product_num);
        printf("Consume %d
", queue[c]);
        queue[c] = 0;
        sem_post(&blank_num);
        c = (c + 1) % NUM;
        sleep(rand() % 5);
    }
}

int main() {
    pthread_t thrA, thrB;

    sem_init(&blank_num, 0, NUM);
    sem_init(&product_num, 0, 0);
    pthread_create(&thrA, NULL, produce, NULL);
    pthread_create(&thrB, NULL, consume, NULL);

    pthread_join(thrA, NULL);
    pthread_join(thrB, NULL);

    sem_destroy(&blank_num);
    sem_destroy(&product_num);

    return 0;
}

可以看到，条件变量需要搭配互斥锁使用，而信号里不一定。条件不满足的时候，即使获取了mutex锁资源，也会自动放弃，等到条件满足时再自动获取。

互斥量与信号量的关系

互斥量Mutex，信号量Semaphore都能用于线程同步/互斥，那么它们有什么区别呢？
信号量可以表示资源可用数目，用于资源的保护。二值信号量（值只能为0或1）时，相当于互斥量。

其他线程间同步机制

如果数据是共享的，那么各线程读到的数据应该总是一致的，不会出现访问冲突。只要有一个线程可以修改数据，就要考虑线程同步问题。由此引出读写锁（Reader-Writer Lock）的概念。
Reader之间并不互斥，Writer是独占的（exclusive），Writer修改数据时，其他Reader或Writer不能访问数据。因此，Reader-Writer Lock比Mutex具有更好的并发性。
用挂起等待的方式解决访问冲突不见得是最好的办法，因为这样会影响系统的并发性，在某些情况下，解决访问冲突的问题可以尽量避免挂起某个进程，如Linux内核Seqlock、RCU（read-copy-update）等机制。

详参加APUE2e

自旋锁

自旋锁类似互斥量, 不过并不通过休眠而阻塞线程, 而是在获取锁之前一直处于忙等(自旋)阻塞状态.
自旋锁适用场景: 锁被持有时间短, 线程不希望在重新调度上花费太多成本.
优点: 在非抢占式内核中时常很有用, 除了提供互斥机制外, 还会阻塞中断, 这样中断处理程序就不会让系统陷入死锁状态;
缺点: 当线程自旋等待锁变为可用时, CPU不能做其他事情, 会浪费大量CPU时间poll;

自旋锁原理类似于下面的代码:

s = 1;

某个线程:
while (s <= 0) { ; }
s--; // P操作
...
s++; // V操作

自旋锁操作方式 (用法类似于互斥量)

#include <pthread.h>

int pthread_spin_init(pthread_spinlock_t *lock, int pshared); // 初始化自旋锁
int pthread_spin_destroy(pthread_spinlock_t *lock); // 销毁自旋锁

/* 成功返回0; 失败返回错误编号*/
int pthread_spin_lock(pthread_spinlock_t *lock);
int pthread_spin_trylock(pthread_spinlock_t *lock);
int pthread_spin_unlock(pthread_spinlock_t *lock);

编程练习

哲学家问题：5个哲学家共有5跟筷子，哲学家坐成一圈，两人中间放一根筷子。哲学家吃饭的时候必须同时得到左右两根筷子。如果身边的任何一位正在使用筷子，那他只有等着。
假设筷子编号：1,2,3,4,5，哲学家编号：A,B,C,D,E，哲学家和筷子围城一圈如下图所示：

编程模拟哲学家就餐场景：

Philosopher A fetches chopstick 5
Philosopher B fetches chopstick 1
Philosopher B fetches chopstick 2
Philosopher D fetches chopstick 3
Philosopher B releases chopsticks 1 2
Philosopher A fetches chopstick 1
Philosopher C fetches chopstick 2
Philosopher A releases chopsticks 5 1
......

用5个互斥锁Mutex表示5根筷子，5个独立线程代表5个哲学家就餐过程，要求每个哲学家都先拿左边的筷子，再拿右边的筷子，有任何一边那不到就等着，全拿到就吃饭rand()%10秒，然后放下筷子。

分析：
如果5个线程中哲学家都先取走左边的筷子，然后等待右边的筷子，就容易形成死锁。

解决办法：
参考哲学家进餐问题-3种解决方案 | 博客园

思路一
通过一个额外的mutex，确保取走左边筷子和右边筷子是一个原子操作，即要么都取走，要么都不能取走。
核心伪代码

void *philosopher(void *arg) {
  int id = *(int *)arg; // id 是哲学家数组的索引
  while (true) {
    lock(&mutex); // 通过mutex确保同时取左边筷子和右边筷子是原子操作
    take_forks(id); // 取走左边和右边筷子
    eating();
    unlock(&mutex);
    putdown_forks(id);
  }
}

完整源码

#include <pthread.h>
#include <stdio.h>
#include <stdbool.h>
#include <unistd.h>
#include <stdlib.h>

#define N     5 // 5个哲学家

static char names[N] = {'A', 'B', 'C', 'D', 'E'}; // 哲学家编号
static pthread_mutex_t cho[N]; // 5个筷子对应5个互斥锁
static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

void take_forks(int id);
void putdown_forks(int id);

/**
 * names_index  phi_id  cho_id  cho_index
 *      0          A     5,1       4,0
 *      1          B     1,2       0,1
 *      2          C     2,3       1,2
 *      3          D     3,4       2,3
 *      4          E     4,5       3,4
*/
static inline int left(int index) {
    return (index + N - 1) % N;
}

static inline int right(int index) {
    return index;
}

static void eating() {
    sleep(rand() % 2);
}

static void *philosopher(void *arg) {
    const int id = *(int *)arg;
    printf("philosopher : id = %d
", id);

    while (true) {
        pthread_mutex_lock(&lock);
        take_forks(id);
        pthread_mutex_unlock(&lock);

        eating();
        putdown_forks(id);
    }

    return NULL;
}

void putdown_forks(int id) {
    pthread_mutex_unlock(&cho[left(id)]);
    pthread_mutex_unlock(&cho[right(id)]);

    printf("Philosopher %c release chopstick %d %d
", names[id], left(id) + 1, right(id) + 1);
}

void take_forks(int id) {
//    int l = left(id);
//    int r = right(id);
//    printf("left index: %d, right index: %d
", left(id), right(id));

    printf("Philosopher %c fetches chopstick %d
", names[id], left(id) + 1);
    printf("Philosopher %c fetches chopstick %d
", names[id], right(id) + 1);
    pthread_mutex_lock(&cho[left(id)]);
    pthread_mutex_lock(&cho[right(id)]);
}

// 通过一个mutex lock, 确保同时取得左右的筷子是原子操作
void solution1() {
    for (int i = 0; i < N; ++i) {
        pthread_mutex_init(&cho[i], NULL);
    }

    pthread_t thrs[N];

    int ids[N];

    for (int i = 0; i < N; ++i) {
        ids[i] = i;
        pthread_create(&thrs[i], NULL, philosopher, &ids[i]);
    }

    void *tret;
    for (int i = 0; i < N; ++i) {
        pthread_join(thrs[i], &tret);
        printf("thread %d exit with code %d
", (int)tret);
    }

    for (int i = 0; i < N; ++i) {
        pthread_mutex_destroy(&cho[i]);
    }
}

int main() {
  solution1();
  return 0;
}

思路2
最多只有4个哲学家才能先取走左边筷子，这样只是有一个哲学家可能成功就餐，不会形成死锁。
这样就需要设置个信号量room 初值 4，代表最多有4个可先取走左边筷子的机会；每根筷子将互斥锁mutex换成信号量semaphore

#include <semaphore.h>
#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>

static sem_t room;
static sem_t chop[N];

void take_forks2(int id) {
    sem_wait(&chop[left(id)]);
    sem_wait(&chop[right(id)]);
    printf("Philosopher %c fetches chopstick %d
", names[id], left(id) + 1);
    printf("Philosopher %c fetches chopstick %d
", names[id], right(id) + 1);
}

void putdown_forks2(int id) {
    sem_post(&chop[left(id)]);
    sem_post(&chop[right(id)]);

    printf("Philosopher %c release chopstick %d %d
", names[id], left(id) + 1, right(id) + 1);
}

void *philosopher2(void *arg) {
    int id = *(int *)arg;

    while (true) {
        sem_wait(&room);
        take_forks2(id);
        sem_post(&room);
        eating();
        putdown_forks2(id);
    }
}

// 确保最多只有4个人能同时取得左边的筷子
void solution2() {
    const int room_num = N - 1;
    pthread_t thrs[N];
    int ids[N];
    int err;

    sem_init(&room, 0, room_num);

    for (int i = 0; i < N; ++i) {
        sem_init(&chop[i], 0, 1);
    }

    for (int i = 0; i < N; ++i) {
        ids[i] = i;
        int err = pthread_create(&thrs[i], NULL, philosopher2, &ids[i]);
        if (err != 0) {
            perror("can't create thread
");
        }
    }

    while (1) {}

    sem_destroy(&room);
    for (int i = 0; i < N; ++i) {
        sem_destroy(&chop[i]);
    }
}

线程与信号

每个线程都有自己的信号屏蔽字(signal mask), 但是信号的处理是进程中所有线程共享的. 也就是说, 单个线程可以阻止某些信号, 但是当线程修改了某个信号处理行为后, 所有线程共享这个改动.
简而言之, 线程有权选择是否屏蔽信号, 但是信号捕获方式(SIG_DFL(默认)/SIG_IGN(忽略)/捕获), 以及捕获函数都是共享的.

信号的递送

如果一个信号与硬件故障相关, 那么信号一般会被发送到引起该事件的线程中去, 其他信号则被发送到任意一个线程. 哪个线程取决于系统具体实现.

pthread_sigmask

sigprocmask 修改进程的signal mask(信号屏蔽字)阻止信号发送, 而pthread_sigmask修改线程的signal mask阻止信号发送给线程.pthread_sigmask也可以用于获取线程的signal mask.

#include <signal.h>
int pthread_sigmask(int how, const sigset *restrict set, sigset_t *restrict oset);

pthread_sigmask与sigprocmask类似, 不过失败时返回错误码, 而不是-1.
参数
how 取值: SIG_BLOCK 把信号添加到线程信号屏蔽字; SIG_SETMASK 用信号集替换线程的信号屏蔽字; SIG_UNBLOCK 从线程信号屏蔽字中移除信号集
set 用于修改线程的信号屏蔽字的信号集. 当set为NULL时, oset可用于获取线程当前的信号屏蔽字
oset 如果oset不为NULL, 线程之前的信号屏蔽字就存在它指向的sigset_t结构中

sigwait

线程可以调用sigwait等待一个或多个信号的出现. 线程调用sigwait等待信号的时候, 是处于阻塞状态的.

#include <signal.h>

int sigwait(const sigset_t *restrict set, int *restrict signop);

参数
set 指定线程等待的信号集
sigop 指向的整数将包含发送信号的数量. 注意不是信号的编号.

如果信号集中的某个信号在sigwait调用时, 处于pending状态, 那么sigwait将无阻塞返回, 而且返回前sigwait将从进程中清除那些pending的信号. 如果实现支持排队信号, sigwait也最多只会移除一个实例, 其他实例还要排队.
sigwait会原子地取消信号集的阻塞状态, 直到有新的信号被递送; 返回前, sigwait将恢复线程的信号屏蔽字.

如果多个线程在sigwait调用中等待同一个信号而阻塞, 只有一个线程可以从sigwait返回; 一个线程捕获信号, 另外一个线程sigwait等待信号, 具体是由哪个线程处理信号(第一个捕获, or 第二个从sigwait返回), 取决于系统实现.

pthread_kill

kill可以向一个进程发送信号, 而向一个线程发送信号使用pthread_kill.

#include <signal.h>

int pthread_kill(pthread_t thread, int signo);

成功返回0, 失败返回错误编号.
类似于kill, 可以传signo = 0, 检查线程是否存在.

线程与fork

注意: 不建议同时使用多线程和多进程.

当线程调用fork时, 会为子进程创建整个进程地址空间的副本(正文段, init段, bss段, 堆段, 栈段, 命令行参数和环境变量段).
除了地址空间, 子进程还从父进程继承了互斥量, 读写锁, 条件变量的状态. 如果不是马上调用exec, 就需要清理锁状态, 因为这些锁状态是父进程的运行状态, 在子进程没有意义. 然而, 子进程并知道自己占有了哪些锁, 哪些锁需要释放.
如果没有exec, 子进程只能调用异步信号安全的函数(也就是没有使用锁), 不过这样限制了子进程的功能.

pthread_atfork

要清除锁状态, 可以调用pthread_atfork函数建立fork处理程序.

#include <pthread.h>
int pthread_atfork(void (*prepare)(void), void (*parent)(void), void (*child)(void));

成功返回0 ; 失败返回错编号.
pthread_atfork可安装3个帮助清理函数:
prepare fork处理程序由父进程在fork创建子进程前调用, 该处理程序任务是获取父进程定义的所有锁;
parent fork处理程序是在fork创建子进程后、返回之前, 在父进程上下文中调用的, 任务是对prepare fork处理程序获取的所有锁进行解锁;
child fork处理程序是在fork创建子进程后、返回之前, 在子进程上下文中调用的, 任务是对prepare fork处理程序获取的所有锁进行解锁;

这样做的目的是, 避免加锁一次, 解锁两次的情况.

线程与I/O

因为进程中所有线程共享文件描述符, 而且一个打开的文件只有一个偏移, 因此两个线程同时分别对同一个文件描述符进行lseek, read等操作, 会导致不安全行为.

线程A                               线程B
lseek(fd, 300, SEEK_SET);           lseek(fd, 700, SEEK_SET);
read(fd, buf1, 100);                read(fd, buf2, 100);

文件读锁(共享锁)并不能避免并发线程对同一文件读问题, 因为2个线程都是进行读操作. 而解决这个问题, 可以使用pread, pwrite. pread使偏移量的设定和数据读取写操作是一个原子操作. pwrite类似, 确保设置偏移量和数据写操作是一个原子操作.

pread/pwrite 原子偏移+读/写

pread, pwrite, 以给定偏移从文件描述符读/写数据

#include <unistd.h>

ssize_t pread(int fd, void *buf, size_t count, off_t offset);

ssize_t pwrite(int fd, const void *buf, size_t count, off_t offset);

成功返回0; 失败-1, errno设置.

pread, pwrite 与普通read, write函数相比, 多了lseek偏移操作, 并且跟读/写绑定到一起成为原子操作.