POSIX Threads Programming

本文链接：

https://computing.llnl.gov/tutorials/pthreads/#Abstract

另外一个中文版：

https://www.cnblogs.com/mywolrd/archive/2009/02/05/1930707.html

1、Abstract

　　在共享内存的多个处理器的结构上，threads可以被用来实现并行。从历史上看，硬件供应商已经实现了自己的线程专有版本，使移植成为软件开发人员关心的问题。在UNIX系统中，一个标准的C语言的多线程编程接口已经被IEEE POSIX 1003.1c标准规定。坚持这个标准的实现称为POSIX线程，或pthreads。

　　这个教程从介绍概念，动机以及使用pthreads的设计考虑开始。在Pthreads的API中的例程的三个主要类别的每一个，都会包含以下方面：线程管理、互斥变量和条件变量。在这里的例子代码，都是用来证明怎么样被一个新的Pthreads的程序员来进行使用的的。

　　这个教程最后会以一个LLNL specifics来进行总结讨论，并且最后怎么样结合pthreads和MPI来进行共同使用。一个实验室的练习，以及很多的 example codes (C Language) 会被包含在这里。

　　这个教程对于那些使用pthreads来进行并行编程的新手程序员是很合适的。需要对C语言中的并行编程有基本的了解。如果对并行编程不了解的，可以看看这个网址：

https://computing.llnl.gov/tutorials/parallel_comp　　　　　　　　　　　　　　　　　　　　　　

2、Pthreads Overview

什么是Thread?

　　从技术上讲，线程被定义为一个独立的指令流，可以由操作系统调度运行。但这意味着什么呢？

　　对于软件开发人员来说，独立于主程序独立运行的“过程”的概念可以最好地描述一个线程。

　　更进一步，想象一下，一个主程序（a.out）包含多个程序。然后设想所有这些程序都能被操作系统独立运行进行独立的调度运行或调度。

　　这是如何实现的？

　　在了解一个线程之前，首先需要了解一个UNIX进程。进程是由操作系统创建的，需要大量的开销。进程包含有关程序资源和程序执行状态的信息，包括：

Process ID, process group ID, user ID, and group ID
Environment
Working directory.
Program instructions
Registers
Stack
Heap
File descriptors
Signal actions
Shared libraries
Inter-process communication tools (such as message queues, pipes, semaphores, or shared memory).

UNIX PROCESS

THREADS WITHIN A UNIX PROCESS

　　线程在这些进程资源中使用和存在，但它们可以由操作系统调度，并作为独立实体运行，主要是因为它们只复制了使它们作为可执行代码存在的必需的基本资源。

　　这种独立的控制流程是很好的，因为线程保持自己的控制下面的这些资源：　　

Stack pointer
Registers
Scheduling properties (such as policy or priority)
Set of pending and blocked signals
Thread specific data.

　　因此，总之，在UNIX环境中，线程：

Exists within a process and uses the process resources
Has its own independent flow of control as long as its parent process exists and the OS supports it
Duplicates only the essential resources it needs to be independently schedulable
May share the process resources with other threads that act equally independently (and dependently)
Dies if the parent process dies - or something similar
Is "lightweight" because most of the overhead has already been accomplished through the creation of its process.

　　因为同一进程中的线程共享资源：

Changes made by one thread to shared system resources (such as closing a file) will be seen by all other threads.
Two pointers having the same value point to the same data.
Reading and writing to the same memory locations is possible, and therefore requires explicit synchronization by the programmer.

什么是Pthreads？

　　从历史上看，硬件供应商已经实现了自己的线程专有版本。这些实现本质上各不相同，使得程序员很难开发可移植的线程应用程序。

　　为了充分利用线程提供的功能，需要一个标准化的编程接口。　

For UNIX systems, this interface has been specified by the IEEE POSIX 1003.1c standard (1995).
Implementations adhering to this standard are referred to as POSIX threads, or Pthreads.
Most hardware vendors now offer Pthreads in addition to their proprietary API's.

　　POSIX标准的不断演变和经过修订，包括pthreads规范。

　　一些有用的链接：

　　pthreads是定义为一套C语言编程的类型和过程调用，用pthread实现。H头/包括文件和一个线程库-虽然这个图书馆可能是另一个图书馆的一部分，比如libc，在一些实施方案。

为什么要使用Pthreads？

轻量:

　　与创建和管理进程的成本相比，创建一个线程会产生更少的操作系统开销的。管理线程需要比管理进程更少的系统资源。

　　例如，下面的表格对fork()子程序和子程序pthread_create()的计时结果。计时反映50000个进程/线程创建，这些创建都是使用时间工具来一起运行，这些工具的计时是使用妙的，没有使用优化的标志：

　　注意：不要期望system和user时间加起来是real time的时间。因为这些是SMP系统，多个CPU /内核同时处理这个问题。充其量，这些都是在本地机器上运行的，过去的和现在的。

Platform	`fork()`			`pthread_create()`
Platform	real	user	sys	real	user	sys
Intel 2.6 GHz Xeon E5-2670 (16 cores/node)	8.1	0.1	2.9	0.9	0.2	0.3
Intel 2.8 GHz Xeon 5660 (12 cores/node)	4.4	0.4	4.3	0.7	0.2	0.5
AMD 2.3 GHz Opteron (16 cores/node)	12.5	1.0	12.5	1.2	0.2	1.3
AMD 2.4 GHz Opteron (8 cores/node)	17.6	2.2	15.7	1.4	0.3	1.3
IBM 4.0 GHz POWER6 (8 cpus/node)	9.5	0.6	8.8	1.6	0.1	0.4
IBM 1.9 GHz POWER5 p5-575 (8 cpus/node)	64.2	30.7	27.6	1.7	0.6	1.1
IBM 1.5 GHz POWER4 (8 cpus/node)	104.5	48.6	47.2	2.1	1.0	1.5
INTEL 2.4 GHz Xeon (2 cpus/node)	54.9	1.5	20.8	1.6	0.7	0.9
INTEL 1.4 GHz Itanium2 (4 cpus/node)	54.5	1.1	22.2	2.0	1.2	0.6

高效的通信/数据交换：

　　在一个高性能的计算环境下，考虑使用的Pthreads的最初动机是为了获得最优的性能。特别是，如果应用程序使用MPI节点通信，通过使用pthreads，有可能潜在的性能会提高。

　　MPI库通常通过共享内存实现节点任务通信，共享内存至少包含一个内存拷贝操作。（进程到进程）

　　对于Pthreads，不需要一个立即的内存拷贝，因为线程在同一进程中共享相同的地址空间。它本身，不会有数据进行传输。它可以像简单地传递指针一样高效。

　　在最坏的情况下，Pthread交流会成为比一个cache-to-CPU或memory-to-CPU的带宽的问题稍微复杂一点的问题。这些速度远远高于MPI共享内存通信。

　　例如：一些本地比较，过去和现在，如下所示：

Platform	MPI Shared Memory Bandwidth (GB/sec)	Pthreads Worst Case Memory-to-CPU Bandwidth (GB/sec)
Intel 2.6 GHz Xeon E5-2670	4.5	51.2
Intel 2.8 GHz Xeon 5660	5.6	32
AMD 2.3 GHz Opteron	1.8	5.3
AMD 2.4 GHz Opteron	1.2	5.3
IBM 1.9 GHz POWER5 p5-575	4.1	16
IBM 1.5 GHz POWER4	2.1	4
Intel 2.4 GHz Xeon	0.3	4.3
Intel 1.4 GHz Itanium 2	1.8	6.4

其他常见原因：

　　线程应用程序提供了潜在的性能优势和非线程应用程序在其他几个方面的实际优势：

CPU与I/O的重叠工作：例如，程序可能有执行长I/O操作的部分。当一个线程等待I/O系统调用完成时，CPU密集型的工作可以由其他线程执行。
优先级/实时调度：更重要的任务可以安排替换或中断较低优先级的任务。
异步事件处理：具有不确定频率和持续时间的服务事件可以交错进行的任务。例如，Web服务器既可以从以前的请求中传输数据，也可以管理新请求的到达。

　　一个很好的例子是典型的Web浏览器，其中许多交错任务可以同时发生，任务可以在优先级上变化。　　

　　另一个很好的例子是现代操作系统，它广泛使用线程。下面显示了使用线程的MS Windows操作系统和应用程序的屏幕截图。

Designing Threaded Programs:

并行编程：

　　在现代，多核机器，pthreads非常适合并行编程，一般都会适用于并行编程，适用于并行Pthreads程序。

　　设计并行程序有许多考虑因素，如

What type of parallel programming model to use?
Problem partitioning
Load balancing
Communications
Data dependencies
Synchronization and race conditions
Memory issues
I/O issues
Program complexity
Programmer effort/costs/time
...

　　对这些题目感兴趣的，可以阅读 https://computing.llnl.gov/tutorials/parallel_comp/ 中的教程

　　一般来说，为了一个程序利用pthreads，它必须能够被组织成离散的，它可以同时执行独立任务。例如，如果例程1和例程2是能在实时中能够被交换的，交替的，可重叠的，他们就可以可以作为线程的。如下图

　　具有以下特性的程序可能适合pthreads

Work that can be executed, or data that can be operated on, by multiple tasks simultaneously:
Block for potentially long I/O waits
Use many CPU cycles in some places but not others
Must respond to asynchronous events
Some work is more important than other work (priority interrupts)

　　线程程序有几种常见的模型：

Manager/worker: a single thread, the manager assigns work to other threads, the workers. Typically, the manager handles all input and parcels out work to the other tasks. At least two forms of the manager/worker model are common: static worker pool and dynamic worker pool.
Pipeline: a task is broken into a series of suboperations, each of which is handled in series, but concurrently, by a different thread. An automobile assembly line best describes this model.
Peer: similar to the manager/worker model, but after the main thread creates other threads, it participates in the work.

共享内存模型：

All threads have access to the same global, shared memory
Threads also have their own private data
Programmers are responsible for synchronizing access (protecting) globally shared data.

Thread-safeness:

　　线程安全性：简而言之，是指一个应用程序的能力来同时执行多个线程没有“破坏”共享数据或创建“竞争”的条件。

　　例如，假设您的应用程序创建了多个线程，每个线程都调用同一个库例程：

This library routine accesses/modifies a global structure or location in memory.
As each thread calls this routine it is possible that they may try to modify this global structure/memory location at the same time.
If the routine does not employ some sort of synchronization constructs to prevent data corruption, then it is not thread-safe.

　　对外部库例程的用户的意味着，如果您不是100%确定例程是线程安全的，那么您可能会遇到可能出现的问题。

　　建议：如果你的应用程序使用的库或其他对象不明确保证线程安全性要小心。当有疑问时，假设它们不是线程安全的，除非被证明是另外的。这可以通过“序列化”来调用不确定的程序来进行安全调用。

Thread Limits:

　　虽然pthreads API是一个ANSI / IEEE标准，不同的实现通常会与标准指定的方式有不同。　　

　　因此，在一个平台上运行良好的程序可能会在另一个平台上失败或产生错误的结果。

　　例如，允许的最大线程数和默认线程堆栈大小是设计程序时要考虑的两个重要限制。

　　在本教程后面将更详细地讨论几个线程限制。

The Pthreads API:

　　原pthreads API在ANSI / IEEE POSIX 1003.1定义的1995个标准。POSIX标准的不断演变和经过修订，包括pthreads规范。

　　该标准的拷贝可以从IEEE购买，也可以从其他网站免费下载。

　　包括pthreads API可以被分为四大组的子程序：

　　　　线程管理：程序的工作直接对线程的创建，删除，加入，等。他们也包括功能设置/查询线程属性（可连接、调度等）。　　　　

　　　 Mutexes：例程处理同步，称为“互斥”，这是“相互排斥”的缩写。互斥功能提供了用于创建互斥、销毁互斥、锁定互斥和解锁互斥。这些设置或修改与mutexes相关的属性的个函数是相互补充的。

　　　　条件变量：处理共享互斥锁线程之间通信的例程。基于程序员指定的条件。这个组包括基于指定变量值创建、销毁、等待和信号的函数。还包括设置/查询条件变量属性的函数。

　　　　同步：管理读写锁和屏障的例程

　　命名约定：在线程库的所有标识符开始pthread_。下面给出了一些示例。

Routine Prefix	Functional Group
pthread_	Threads themselves and miscellaneous subroutines
pthread_attr_	Thread attributes objects
pthread_mutex_	Mutexes
pthread_mutexattr_	Mutex attributes objects.
pthread_cond_	Condition variables
pthread_condattr_	Condition attributes objects
pthread_key_	Thread-specific data keys
pthread_rwlock_	Read/write locks
pthread_barrier_	Synchronization barriers

　　不透明对象的概念贯穿于API的设计中。基本调用用于创建或修改不透明对象。不透明的对象可以通过调用属性函数来修改，这些函数处理不透明的属性。

　　pthreads API包含100个子程序。本教程将集中于这些子集。具体地说，那些最有可能立即开始pthreads的程序员有用。

　　为了可移植性，pthread.h头文件应包括使用pthreads库中的每个源文件。

　　目前的POSIX标准只为C语言定义。

　　一批优秀的书籍pthreads是可用的。清看后面的参考列表：https://computing.llnl.gov/tutorials/pthreads/#References

Compiling Threaded Programs

　　几个例子编译使用pthreads码命令列在下表。

Compiler / Platform	Compiler Command	Description
INTEL Linux	`icc -pthread`	C
INTEL Linux	`icpc -pthread`	C++
PGI Linux	`pgcc -lpthread`	C
PGI Linux	`pgCC -lpthread`	C++
GNU Linux, Blue Gene	`gcc -pthread`	GNU C
GNU Linux, Blue Gene	`g++ -pthread`	GNU C++
IBM Blue Gene	`bgxlc_r / bgcc_r`	C (ANSI / non-ANSI)
	`bgxlC_r, bgxlc++_r`	C++

Thread Management：

创建和结束线程：

Routines:

pthread_create (thread,attr,start_routine,arg)

pthread_exit (status)

pthread_cancel (thread)

pthread_attr_init (attr)

pthread_attr_destroy (attr)

Creating Threads:

　　最初，你的main()程序包括一个默认的线程。所有其他线程都必须由程序员显式创建。

　　pthread_create创建新线程并执行。这个例程可以从代码中的任何地方调用任意次数。

　　pthread_create参数：

thread: An opaque, unique identifier for the new thread returned by the subroutine.
attr: An opaque attribute object that may be used to set thread attributes. You can specify a thread attributes object, or NULL for the default values.
start_routine: the C routine that the thread will execute once it is created.
arg: A single argument that may be passed to start_routine. It must be passed by reference as a pointer cast of type void. NULL may be used if no argument is to be passed.

　　进程可能创建的最大线程数是由实现来决定的。试图超过极限的程序可能失败或产生错误的结果。

　　查询并设置实现的线程限制，下面显示一个Linux的例子，演示查询默认（软）限制，然后将最大进程数（包括线程）设置为硬限制。然后验证该限制已被重写。

Flags
Item    Description
-a    Lists all of the current resource limits.
-c    Specifies the size of core dumps, in number of 512-byte blocks.
-d    Specifies the size of the data area, in number of K bytes.
-f    Sets the file size limit in blocks when the Limit parameter is used, or reports the file size limit if no parameter is specified. The -f flag is the default.
-H    Specifies that the hard limit for the given resource is set. If you have root user authority, you can increase the hard limit. Anyone can decrease it.
-m    Specifies the size of physical memory (resident set size), in number of K bytes. This limit is not enforced by the system.
-n    Specifies the limit on the number of file descriptors a process may have.
-r    Specifies the limit on the number of threads a process can have.
-s    Specifies the stack size, in number of K bytes.
-S    Specifies that the soft limit for the given resource is set. A soft limit can be increased up to the value of the hard limit. If neither the -H nor -S flags are specified, the limit applies to both.
-t    Specifies the number of seconds to be used by each process.
-u    Specifies the limit on the number of a process a user can create.

[andrew@15:38:41 ~]$ ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63662
max locked memory       (kbytes, -l) 1024
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 63662
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

[andrew@15:38:46 ~]$ ulimit -Hu
63662

[andrew@15:43:29 ~]$ ulimit -u 6666
[andrew@15:43:43 ~]$ ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63662
max locked memory       (kbytes, -l) 1024
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 6666
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

　　一旦创建，线程是对等的，并可能创建其他线程。线程之间没有隐含的层次结构或依赖关系。

Thread 属性：

　　默认情况下，线程是用特定属性创建的。程序员可以通过线程属性对象改变这些属性。

　　pthread_attr_init和pthread_attr_destroy用于初始化/销毁线程属性对象。

　　然后，其他例程用于查询/设置线程属性对象中的特定属性。属性包括：

Detached or joinable state
Scheduling inheritance
Scheduling policy
Scheduling parameters
Scheduling contention scope
Stack size
Stack address
Stack guard (overflow) size

　　这些属性中的一些将在稍后讨论。

Thread Binding and Scheduling:

　　问题：在创建了一个线程之后，您如何知道a）何时将被操作系统运行，以及B将运行哪个处理器/核心？

　　答：除非你使用pthreads的调度机制，取决于实现和/或操作系统来决定何时何地执行线程。健壮的程序不应依赖于按特定顺序或特定处理器/核心执行的线程。

　　pthreads API提供了几个例程，可以用来指定线程调度执行。例如，线程可以调度运行FIFO（先入先出）、RR（循环）或其他（操作系统确定）。它还提供了设置线程调度优先级值的能力。

　　这些问题就不在这里覆盖了，可以通过sched_setscheduler的man page来进行了解

　　pthreads API不提供绑定线程特定程序CPU /核心。然而，本地的实现可能会包含这个功能。比如说，提供非标准的 pthread_setaffinity_np 例程。请注意，“_np”命名为“不可移植”

　　此外，本地操作系统可能提供了一种方法。例如，Linux提供了sched_setaffinity的例程。

Terminating Threads & pthread_exit():

　　有几种方法可以终止线程：

The thread returns normally from its starting routine. Its work is done.
The thread makes a call to the pthread_exit subroutine - whether its work is done or not.
The thread is canceled by another thread via the pthread_cancel routine.
The entire process is terminated due to making a call to either the exec() or exit()
If main() finishes first, without calling pthread_exit explicitly itself

　　pthread_exit()例程允许程序员指定一个可选的终止状态参数。这个可选参数通常返回到线程“joining”终止的线程（稍后在后面描述）。

　　在执行正常完成的子程序中，你通常可以使用pthread_exit()来进行结束进程。当然，除非您想通过可选的状态代码返回。

　　清理：常规的pthread_exit()不关闭文件；线程内打开的任何文件在线程结束后都保持打开。

　　从main()调用pthread_exit()讨论：

There is a definite problem if main() finishes before the threads it spawned if you don't call pthread_exit() explicitly. All of the threads it created will terminate because main() is done and no longer exists to support the threads.
By having main() explicitly call pthread_exit() as the last thing it does, main() will block and be kept alive to support the threads it created until they are done.

例子：Pthread Creation and Termination

　　这个简单的示例代码创建5个线程的pthread_create()常规。每个线程打印一个“Hello World！”消息，然后终止与一个叫pthread_exit()。

 #include <pthread.h>
 #include <stdio.h>
 #define NUM_THREADS     5

 void *PrintHello(void *threadid)
 {
    long tid;
    tid = (long)threadid;
    printf("Hello World! It's me, thread #%ld!
", tid);
    pthread_exit(NULL);
 }

 int main (int argc, char *argv[])
 {
    pthread_t threads[NUM_THREADS];
    int rc;
    long t;
    for(t=0; t<NUM_THREADS; t++){
       printf("In main: creating thread %ld
", t);
       rc = pthread_create(&threads[t], NULL, PrintHello, (void *)t);
       if (rc){
          printf("ERROR; return code from pthread_create() is %d
", rc);
          exit(-1);
       }
    }

    /* Last thing that main() should do */
    pthread_exit(NULL);
 }

Thread Management：

　　向线程传递参数　　

常规的pthread_create()允许程序员通过一个参数的线程启动程序。在要用到多个参数的情况下，这个限制可以通过创建一个结构体来进行克服，这个结构体包含了所有的参数，然后再传这个结构体到pthread_create()的程序里面。
所有的参数必须通过引用来进行传递，并且转变为 (void *)

　　问题：你要如何安全的传递数据到新的进程，要考虑到他们的不知道什么时候开始，也不知道什么时候被调度？

　　解答：确保所有经过的数据都是线程安全的，不能被其他线程更改。下面的三个例子，说明什么应该做，什么不应该做。

例子1：这个代码片段演示了如何将一个简单的整数传递给每个线程。调用线程为每个线程使用唯一的数据结构，确保每个线程的参数在整个程序中保持不变。

long taskids[NUM_THREADS];

for(t=0; t<NUM_THREADS; t++)
{
   taskids[t] = t;
   printf("Creating thread %ld
", t);
   rc = pthread_create(&threads[t], NULL, PrintHello, (void *) taskids[t]);
   ...
}

例子2：这个例子展示了如何通过一个结构来设置/传递多个参数。每个线程接收结构的唯一实例。

struct thread_data{
   int  thread_id;
   int  sum;
   char *message;
};

struct thread_data thread_data_array[NUM_THREADS];

void *PrintHello(void *threadarg)
{
   struct thread_data *my_data;
   ...
   my_data = (struct thread_data *) threadarg;
   taskid = my_data->thread_id;
   sum = my_data->sum;
   hello_msg = my_data->message;
   ...
}

int main (int argc, char *argv[])
{
   ...
   thread_data_array[t].thread_id = t;
   thread_data_array[t].sum = sum;
   thread_data_array[t].message = messages[t];
   rc = pthread_create(&threads[t], NULL, PrintHello, 
        (void *) &thread_data_array[t]);
   ...
}

例子3：此示例执行参数传递错误。它传递变量t的地址，它是共享内存空间，对所有线程都可见。随着循环的进行，这个内存位置的值会改变，可能会在线程访问之前就已经改变了。

int rc;
long t;

for(t=0; t<NUM_THREADS; t++) 
{
   printf("Creating thread %ld
", t);
   rc = pthread_create(&threads[t], NULL, PrintHello, (void *) &t);
   ...
}

Thread Management：

　　Joining and Detaching Threads

Routines:

pthread_join (threadid,status)

pthread_detach (threadid)

pthread_attr_setdetachstate (attr,detachstate)

pthread_attr_getdetachstate (attr,detachstate)

Joining:

　　“joining”是完成线程间同步的一种方法。例如:　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　

　　pthread_join()例程阻塞调用线程，直到指定就线程终止。

　　程序员能够获得目标线程的终止返回状态，如果它被在目标线程中指定的调用pthread_exit()。

　　一个joining的线程，可以匹配一个Pthread_join()调用。

　　在同一个线程上尝试多个joins是一个逻辑错误。

　　其他两种同步方法，mutexes 和 condition variables，将会在后面讨论。

Joinable or Not?

　　当一个线程被创建，它的一个属性定义是否是可接合或分离。只有创建为joinable的线程可以加入。如果一个线程被创建为detached，它就永远不能被join。

　　符合POSIX标准的最终草案指定线程应该被创建为加入。

　　显式创建一个线程作为joinable或detached，在pthread_create()常规使用attr参数。典型的4步过程是：

Declare a pthread attribute variable of the pthread_attr_t data type
Initialize the attribute variable with pthread_attr_init()
Set the attribute detached status with pthread_attr_setdetachstate()
When done, free library resources used by the attribute with pthread_attr_destroy()

Detaching:

　　pthread_detach()例程可以用来明确detach线程，即使它创建为joinable。

　　没有逆例程

Recommendations:

　　如果一个线程需要join，考虑显式创建它为joinable。这可以添加可移植性。

　　如果您事先知道线程将永远不需要与其他线程join。考虑在一个detached的状态下创建它。某些系统资源可以被释放。

例子：Pthread Joining

　　这个例子显示了如何等待一个线程完成，通过使用Pthread join的子程序。

　　由于pthreads一些实现可能不会创建线程的时候不会使得他们在一个joinable的状态，所以在这个例子李的线程显式地创建为一个joinable的状态，使得他们后面可以被join。