Step By Step(Java 线程篇)

1. 第一个线程程序：

 1     public static void main(String[] args) {
 2         // 1. 创建了一个线程对象，同时创建了一个匿名内部类Runnable的实现类
 3         // 2. Runnable实现类中的run方法是线程执行时的方法体。
 4         Thread t1 = new Thread(new Runnable() {
 5             @Override
 6             public void run() {
 7                 for (int i = 0; i < 10000; ++i)
 8                     System.out.println(i);
 9             }
10         });
11         // 3. 启动线程
12         t1.start();
13         // 4. join方法将阻塞主线程，同时等待工作者线程的结束
14         // 5. 由于主线程和工作者线程是并行的两个执行体，因此如果主线程不join等待
15         // 工作者线程的结束，那么主线程将会直接退出，如果工作者线程未被设置为后台
16         // 线程，即便他仍需要继续执行，但是由于主线程的退出，工作者线程也强行退出。
17         try {
18             t1.join();
19         } catch (InterruptedException e) {
20             e.printStackTrace();
21         }
22     }

2. 第一个带有中断退出的线程程序：

 1     class MyRunnable implements Runnable {
 2         @Override
 3         public void run() {
 4             int i = 0;
 5             // 每个线程都自行维护一个标志位，既interrupted，当主线程通过线程对象
 6             // 调用interruput方法时就已经将该线程的该标志位置为true，这里工作者线程
 7             // 将会在每次执行完一次循环体内的工作之后都会检查线程是否被中断，既判断
 8             // t1.interrupt()方法置位的boolean变量，如果中断标志为true，则退出当前
 9             // 循环。
10             // Thread.currentThread()通过该静态方法获取执行体所在线程的对象。
11             while (!Thread.currentThread().isInterrupted()) {
12                 i++;
13                 if (i > 100000000)
14                     break;
15             }
16             System.out.println("i = " + i);
17             // 线程退出
18         }
19     }
20 
21     public static void main(String[] args) {
22         Thread t1 = new Thread(new MyRunnable());
23         t1.start();
24         try {
25             // 这里让主线程强制休眠1000毫秒，好让工作者线程得到更多的执行时间。
26             Thread.sleep(1000);
27             System.out.println("Thread.sleep over.");
28             // 主线程主动中断工作者线程。
29             t1.interrupt();
30             System.out.println("t1.interrupt over.");
31             // 主线程主动等待工作者线程的执行结束。
32             t1.join();
33             System.out.println("t1.join over.");
34         } catch (InterruptedException e1) {
35             e1.printStackTrace();
36         }
37     }
38 
39     /*  结果如下：
40     Thread.sleep over.
41     t1.interrupt over.
42     i = 16561506
43     t1.join over. */

如果在工作者线程中使用sleep函数，由于该函数为可中断的阻塞命令，因此在sleep期间主线程调用t1.interrupt()时，sleep函数将退出同时抛出InterruptedException，那么倘若在sleep调用之前主线程就已经调用了interrupt方法，sleep函数将会在进入休眠之前判断该标志，如果为true，也同样抛出该异常，所以如果工作者执行体函数中如果有sleep调用的存在，可以不使用以上判断isInterrupted()，而是直接捕捉异常即可，见下例:

 1     class MyRunnable implements Runnable {
 2         @Override
 3         public void run() {
 4             int i = 0;
 5             try {
 6                 while (true) {
 7                     i++;
 8                     if (i > 100000000)
 9                         break;
10                     Thread.sleep(0);
11                 }
12             } catch (InterruptedException e) {
13                 System.out.println("InterruptedException is raised here");
14                 System.out.println("i = " + i);
15             }
16         }
17     }
18 
19     public static void main(String[] args) {
20         MyRunnable r = new MyRunnable();
21         Thread t1 = new Thread(r);
22         t1.start();
23         try {
24             Thread.sleep(1000);
25             System.out.println("Thread.sleep over.");
26             t1.interrupt();
27             System.out.println("t1.interrupt over.");
28             t1.join();
29             System.out.println("t1.join over.");
30         } catch (InterruptedException e1) {
31             e1.printStackTrace();
32         }
33     }
34 
35     /*  结果如下：
36         Thread.sleep over.
37         t1.interrupt over.
38         InterruptedException is raised here
39         i = 892137
40         t1.join over.    */

以上两种方法都实现了主线程对工作者线程何时退出的主动控制，但是即便如此，以上两例都有一个共同的特点就是在主线程sleep了1000毫秒之后工作者线程的一亿此累加竟然没有完成，在如今的cpu时代这是难以置信的低效了，很明显一个代码例子中影响效率的主要原因源自对while (!Thread.currentThread().isInterrupted())这样的判断，第二个就更加明显更加低效，sleep(0)导致了本次cpu剩余时间片的让出，简单的说cpu每次分给工作者线程的执行时间片只是执行了i++和i > 100000000 的判断这样两个cpu指令，剩余的时间片则被主动让出了。下面给出一个相对高效的方法，该方法在主线程sleep(180)之后，工作者线程就完成了所有工作，见如下代码：

 1     class MyRunnable implements Runnable {
 2         public void stop() {
 3             stop = true;
 4         }
 5 
 6         @Override
 7         public void run() {
 8             int i = 0;
 9             // 通过一个自定义的boolean变量来判断。
10             while (!stop) {
11                 i++;
12                 if (i > 100000000)
13                     break;
14             }
15             System.out.println("i = " + i);
16         }
17 
18         // 这里的volatile关键字是推荐要有的，它表示该变量为容易改变的变量，
19         // 是告诉JVM，对该变量值的读取每次都要从主存中读取，而不是从cpu的
20         // cache中读取上次从主存读取之后的快照，读取快照可以提升程序的执行
21         // 效率。
22         private volatile boolean stop = false;
23     }
24 
25     public static void main(String[] args) {
26         MyRunnable r = new MyRunnable();
27         Thread t1 = new Thread(r);
28         t1.start();
29         try {
30             Thread.sleep(180);
31             System.out.println("Thread.sleep over.");
32             r.stop();
33             System.out.println("r.stop over.");
34             t1.join();
35             System.out.println("t1.join over.");
36         } catch (InterruptedException e1) {
37             e1.printStackTrace();
38         }
39     }
40 
41     /*  结果如下：
42         Thread.sleep over.
43         r.stop over.
44         i = 100000001
45         t1.join over.    */

在以上3中方式中最后一种是最为高效的方法，他同样也是在C++的多线程开发中经常使用的一种技巧，对于第一个例子，C++的程序通常都是调用操作系统提供的强制线程退出的API函数，如windows下的TerminateThread等，这种方式是操作系统所不推荐的，至于第二个例子，在C++的程序没有对应的功能，见以下C++代码：

 1     #include <stdio.h>
 2     #include <conio.h>
 3     #include <Windows.h>
 4     static volatile bool stop =false;
 5     DWORD WINAPI threadFunc(void*) {
 6         int i =0;
 7         //通过一个自定义的boolean变量来判断。
 8         while (!stop) {
 9             ++i;
10             if (i >100000000)
11                 break;
12         }
13         printf("i = %d\n",i);
14         return 0;
15     }
16     int main()
17     {
18         HANDLE h = ::CreateThread(0,0,threadFunc,NULL,NULL,NULL);
19         printf("CreateThread over.\n");
20         ::Sleep(180);
21         printf("Sleep over.\n");
22         stop =true;
23         printf("stop over.\n");
24         WaitForSingleObject(h,-1);
25         printf("WaitForSingleObject over.\n");
26         getch();
27     }
28     /*  结果如下：
29         CreateThread over.
30         i = 100000001
31         Sleep over.
32         stop over.
33         WaitForSingleObject over. */

    3.    线程的调度和优先级：
    记得多年前看过一本书《Win32多线程程序设计》(侯捷翻译的)，里面有一句话非常醒目"线程就如同脱缰的野马", 在之后多年的开发中，只要涉及到多线程问题，这句话无不时时刻刻提醒着我。我们写的应用程序只有在极为个别的情况下才会去依赖线程的优先级，其他更多的情况都是通过线程间的协作和同步来完成多线程的程序开发的。在Java中，线程的调度更多的是依赖JVM所在操作的调度机制，JVM针对不同的平台也提供一些调度的策略，尽管如此，我们的应用程序也不能依赖这些实现调度的细节。在C++的开发中，所有这些很线程相关的操作均来自于操作系统提供的C接口API函数，不同的操作系统提供了不同的函数库，如windows的CreateThread、Linux的pthread_create(pthread库)等。线程的优先级同样如此，不同的操作系统提供了不同的优先级划分策略，但是Java做了一些统一处理，其效果还是需要依赖操作系统的机制。Thread.setPrority()方法提供了10个等级最高为MAX_PRIORITY(10)，最低为MIN_PRIORITY(0)，缺省优先级为NORM_PRIORITY(5)。
    注：Thread.sleep()等线程阻塞函数都会主动让出当前线程的剩余cpu执行时间片，同时当前线程也进入低消耗的睡眠等待状态，Thread.yield()也是同样主动让出当前线程的剩余cpu执行时间片，和阻塞函数的主要差别为yield之后函数并不进入睡眠或等待状态，在下次该线程被调度执行时，yield之后的代码将立即执行。
    4.    同步：
    1)    在大多数实际的多线程应用中，两个或两个以上的线程需要共享对同一个数据区域的存取。如果两个线程存取相同的对象，并且每一个线程都调用了一个修改该对象状态的方法，将会发生什么呢？可以想象，线程彼此踩了对方的脚。根据各线程访问数据的次序，可能会产生讹误的对象。这样一个情况通常被称为竞争条件(race condition)，见以下代码：

 1     class MyRunnable implements Runnable {
 2         @Override
 3         public void run() {
 4             for (int i = 0; i < 100000000; ++i)
 5                 ++counter;
 6             System.out.println("counter = " + counter);
 7         }
 8 
 9         private static int counter = 0;
10     }
11 
12     public static void main(String[] args) {
13         Thread t1 = new Thread(new MyRunnable());
14         Thread t2 = new Thread(new MyRunnable());
15         t1.start();
16         t2.start();
17         try {
18             t1.join();
19             t2.join();
20         } catch (InterruptedException e) {
21         }
22     }
23     /* 结果如下：
24         counter = 118758785
25         counter = 124880018 */

原因分析：counter为类的静态域字段，所以两个对象共享了counter对象。++counter操作是由3个指令完成，取出counter的值、加一，最后在赋值会给counter。由此可见，尽管++counter是一条Java语句，但是是由3条指令完成的，在C/C++中同样如此。既然不是原子操作(counter = 0 在32bits的系统中为原子操作)，就有被中断的可能，比如两个线程可能同时在执行counter的读取操作，读到的值是相同的，分别加一之后的值也是相同的，在回写的时候，由于赋值操作为原子操作，因此两个线程只能在此操作上串行完成，简单的理解本来可以使counter加二的操作变成了加一，这也是为什么两个线程的输出结果中counter没有达到预计的200000000。
2) 锁对象：通过给以上代码++counter加锁的方式保证该语句的3条指令能够被原子的执行，即便在执行其中的任一指令之前线程被调度走，那么另一个线程在尝试加锁的时候，由于第一个线程已经获取了锁，该线程将不得不阻塞在获取锁的指令上，同时也不得不交出cpu的执行权，简单的假设，第一个线程再次被调度，此时它将执行剩余的指令，之后解锁，如果此时该线程再次被调度走，第二个线程的的阻塞将会被返回，同时也获取了该锁并执行++counter的3条指令。见如下修改后的代码：

 1     class MyRunnable implements Runnable {
 2         @Override
 3         public void run() {
 4             for (int i = 0; i < 100000000; ++i) {
 5                 lock.lock();
 6                 ++counter;
 7                 lock.unlock();
 8             }
 9             System.out.println("counter = " + counter);
10         }
11 
12         private static ReentrantLock lock = new ReentrantLock();
13         private static int counter = 0;
14     }
15 
16     public static void main(String[] args) {
17         Thread t1 = new Thread(new MyRunnable());
18         Thread t2 = new Thread(new MyRunnable());
19         t1.start();
20         t2.start();
21         try {
22             t1.join();
23             t2.join();
24         } catch (InterruptedException e) {
25         }
26     }
27     /*  结果如下：
28         counter = 198159169
29         counter = 200000000 */

后面线程退出时的输出结果为200000000。
在C++中并未提供这样的框架支持，所有的这些操作都是基于操作系统提供的基于C接口的API，尽管确实存在一些开源的框架帮助C++的开发者简化了这些平台依赖的操作，但是更多的C++开发者还是倾向直接使用操作系统提供的这些API。对于Windows系统而言，在之前这个Java例子中，win32提供多种方式来完成类似的同步，其中一种是临界区(CriticalSection)，它和Java的ReentrantLock极为相似，不管是使用方式还是锁的可重入性。另外一种被称为内锁，它是很多自旋锁的实现基础，在实现原子加一的操作时，其效率明显好于其他的锁机制，见如下代码：

 1     #include <stdio.h>
 2     #include <conio.h>
 3     #include <Windows.h>
 4     static CDATASection cs;
 5     static int counter =0;
 6     
 7     DWORD WINAPI threadFunc(void*) {
 8         for (int i =0; i <100000000; ++i) {
 9             ::EnterCriticalSection(&cs);
10             ++counter;
11             ::LeaveCriticalSection(&cs);
12         }
13         printf("counter = %d.\n",counter);
14         return 0;
15     }
16     int main()
17     {
18         ::InitializeCriticalSection(&cs);
19         HANDLE h[2];
20         DWORD dwStart = ::GetTickCount();
21         h[0] = ::CreateThread(0,0,threadFunc,NULL,NULL,NULL);
22         h[1] = ::CreateThread(0,0,threadFunc,NULL,NULL,NULL);
23         printf("CreateThread over.\n");
24         WaitForMultipleObjects(2,h,TRUE,INFINITE);
25         printf("WaitForMultipleObjects over.\n");
26         DWORD elapse = ::GetTickCount() - dwStart;
27         printf("The total time is %u.\n",elapse);
28         ::DeleteCriticalSection(&cs);
29         getch();
30     }    
31     /*    输出结果如下：
32         CreateThread over.
33         counter = 180875519.
34         counter = 200000000.
35         WaitForMultipleObjects over.
36         The total time is 9578. */

通过使用内锁优化后的代码效率提高了一倍多，见如下代码：

 1     #include <stdio.h>
 2     #include <conio.h>
 3     #include <Windows.h>
 4     static volatile LONG counter =0;
 5     
 6     DWORD WINAPI threadFunc(void*) {
 7         for (int i =0; i <100000000; ++i) {
 8             ::InterlockedIncrement(&counter);
 9         }
10         printf("counter = %d.\n",counter);
11         return 0;
12     } 
13     
14     int main()
15     {
16         HANDLE h[2];
17         DWORD dwStart = ::GetTickCount();
18         h[0] = ::CreateThread(0,0,threadFunc,NULL,NULL,NULL);
19         h[1] = ::CreateThread(0,0,threadFunc,NULL,NULL,NULL);
20         printf("CreateThread over.\n");
21         WaitForMultipleObjects(2,h,TRUE,INFINITE);
22         printf("WaitForMultipleObjects over.\n");
23         DWORD elapse = ::GetTickCount() - dwStart;
24         printf("The total time is %u.\n",elapse);
25         getch();
26     }    
27     /*  输出结果如下：
28         CreateThread over.
29         counter = 195160114.
30         counter = 200000000.
31         WaitForMultipleObjects over.
32         The total time is 4437.*/

3) 条件变量对象：
Java SE5中的条件变量对象和JDK 1.4中提供的Object.wait()、Object.notify()和Object.notifyAll()等线程间协作同步机制和Linux/Unix下的PThread库提供的pthread_mutex和pthread_condition非常相似，见如下代码：

 1     class MyRunnable implements Runnable {
 2         private int id;
 3         private static Condition cond;
 4         private static ReentrantLock lock;
 5         private static int counter;
 6         static {
 7             lock = new ReentrantLock();
 8             // 这里的条件变量cond必须通过ReentrantLock变量来创建，因为在
 9             // 之后的同步中，需要他们之间的紧密协作才能完成。
10             cond = lock.newCondition();
11             counter = 0;
12         }
13 
14         public MyRunnable(int id) {
15             this.id = id;
16         }
17 
18         @Override
19         public void run() {
20             try {
21                 for (int i = 0; i < 100000000; ++i) {
22                     // 1. 获取锁
23                     lock.lock();
24                     if (id == 1) {
25                         if (counter < 50000000) {
26                             // 2. 释放锁
27                             // 3. 进入等待状态
28                             // 4. 以上两步操作原子的完成。
29                             cond.await();
30                             // 5. 被signalAll()唤醒
31                             // 6. 继续阻塞并重新尝试获取锁lock
32                             // 7. 以上两步操作原子的完成。
33                             // 8. 获取到了lock锁，继续执行后面的代码。
34                         }
35                     } else {
36                         if (counter == 50000000) {
37                             // 2. 唤醒所有等待该条件变量cond的线程，既await阻塞函数。
38                             // 和signal()相比，signal由于只是随机的唤醒其中一个等待
39                             // 线程，所以效率较高，如果在确认当前最多只有一个线程在
40                             // 等待时，可以考虑使用signal代替signalAll，否则最好还是
41                             // 使用signalAll更加安全，避免了程序整体挂起的事件发生。
42                             // 在使用signalAll的时候，由于有多个线程均被唤醒，因此被
43                             // 唤醒的每一个线程均需要重新验证一下代码逻辑，确认一下
44                             // 是否想要做的事情已经被同时唤醒的某一个线程已经做完了。
45                             cond.signalAll();
46                         }
47                     }
48                     ++counter;
49                     lock.unlock();
50                 }
51                 System.out.println("counter = " + counter);
52             } catch (InterruptedException e) {
53             }
54         }
55     }
56 
57     public static void main(String[] args) {
58         Thread t1 = new Thread(new MyRunnable(1));
59         Thread t2 = new Thread(new MyRunnable(2));
60         t1.start();
61         t2.start();
62         try {
63             t1.join();
64             t2.join();
65         } catch (InterruptedException e) {
66         }
67     }
68     /*    输出结果如下：
69         counter = 146402823
70         counter = 200000000
71         The elapse is 7672    */

    对于刚刚提到的pthread_mutex和pthread_cond，由于他们的使用方式和Java的Lock和Condition极为相似，因此下面只是给出他们之间的一一对照：
    PThread(C/C++)                        Java SE5
    pthread_mutex_init(&_mutex,0)        lock = new ReentrantLock()
    pthread_cond_init(&_cond, 0)        cond = lock.newCondition()
    pthread_mutex_lock(&_mutex)            lock.lock()
    pthread_cond_broadcast(&_cond)        cond.signalAll()
    pthread_cond_signal(&_cond）        cond.signal()
    pthread_mutex_unlock(&_mutex)        lock.unlock()
    pthread_cond_wait(&_cond,&_mutex)    cond.await()
    pthread_mutex_destroy(&_mutex)
    pthread_cond_destroy(&_cond)
    注：pthread_cond_wait(&_cond, &_mutex);    在cond进入等待状态时，也是将mutex和cond关联到一起了，pthread的在这一点上的机制和Java中Condition是完全一致的。
    4)    synchronized关键字：
    在多线程调用synchronized内部代码块的情况下，其中的代码将会串行化执行，如：

 1     public synchronized void syncMethod() {
 2         // do something
 3     }
 4 
 5     public static synchronized void syncStaticMethod() {
 6         // do something
 7     }
 8 
 9     // 分别等价于
10     public class MyTest {
11         private Object onlyLock = new Object();
12 
13         public void syncMethod2() {
14             onlyLock.lock();
15             // do something
16             onlyLock.unlock();
17         }
18     }
19 
20     public class MyTest2 {
21         private static Object onlyStaticLock = new Object();
22 
23         public static void syncStaticMethod2() {
24             onlyStaticLock.lock();
25             // do something
26             onlyStaticLock.unlock();
27         }
28     }

以上都是直接将synchronized关键字应用于域方法上，事实上也可以通过使用synchronized(obj)同步块的方式缩小同步的范围，提高并行化的效率，见如下代码：

 1     class MyRunnable implements Runnable {
 2         private int id;
 3         private static Object lock = new Object();
 4         private static int counter = 0;
 5 
 6         public MyRunnable(int id) {
 7             this.id = id;
 8         }
 9 
10         @Override
11         public void run() {
12             try {
13                 for (int i = 0; i < 100000000; ++i) {
14                     synchronized (lock) {
15                         if (id == 1) {
16                             if (counter < 50000000)
17                                 lock.wait();
18                         } else {
19                             if (counter == 50000000)
20                                 lock.notifyAll();
21                         }
22                         ++counter;
23                     }
24                 }
25                 System.out.println("counter = " + counter);
26             } catch (InterruptedException e) {
27             }
28         }
29     }
30 
31     public static void main(String[] args) {
32         Thread t1 = new Thread(new MyRunnable(1));
33         Thread t2 = new Thread(new MyRunnable(2));
34         long start = System.currentTimeMillis();
35         t1.start();
36         t2.start();
37         try {
38             t1.join();
39             t2.join();
40         } catch (InterruptedException e) {
41         }
42         long elapse = System.currentTimeMillis() - start;
43         System.out.println("The elapse is " + elapse);
44     }
45     /*    输出结果如下：
46         counter = 154661540
47         counter = 200000000
48         The elapse is 33844    */

    这里使用synchronized方法同步所用的总时间为33s左右，而之前通过ReentrantLock和Condition方式完成同样的工作所用的时间只有7s左右。
    5)    volatile关键字：
    如果将某个域字段声明为volatile，那么在运行时所有对该字段的操作(set/get)都将会从主存中操作，而不是在cpu为各个线程缓存的cache中读取该域字段的快照，从而提供运行时的效率。如：
    volatile boolean stop = false;
    void stop(boolean stop) { this.stop = stop; }
    boolean isStopped() { return stop; }
    这里对这两个函数的调用通常是在不同的两个线程进行的，由于读写的是共享资源因此一般需要加锁保护，但是考虑到stop的get/set是原子操作，这样只需加入volatile关键字通知JVM每次操作都从主存中读取即可。然而对于非原子操作如 stop = !stop; volatile仍然是无法提供任何保障的，因此对于非原子操作只能选择加锁。在C++中该关键字也同样存在，也是同样的作用和用法。然而从C++的视角去分析，可以得到更为深入的了解。比如gcc的汇编中为该类型的变量前加入了一个栅障(barrier)指令，从而保证多cpu之间对该变量的操作是同步的。
    6)    死锁：
    一般而言，死锁通常发生于两个线程之间，其中A线程持有B锁，同时尝试去持有B锁，而此时B线程已经持有B锁，正在尝试持有A锁，在这种情况下A,B两个线程均被挂起互相等待，见如下代码：

 1     class MyRunnable implements Runnable {
 2         private int id;
 3         private static ReentrantLock lockA;
 4         private static ReentrantLock lockB;
 5         private static int counter;
 6         static {
 7             lockA = new ReentrantLock();
 8             lockB = new ReentrantLock();
 9             counter = 0;
10         }
11 
12         public MyRunnable(int id) {
13             this.id = id;
14         }
15 
16         @Override
17         public void run() {
18             for (int i = 0; i < 100000000; ++i) {
19                 if (id == 1) {
20                     System.out.println("id = 1, before lockA.lock()");
21                     lockA.lock();
22                     System.out.println("id = 1, After lockA.lock()");
23                     lockB.lock();
24                     System.out.println("id = 1, After lockB.lock()");
25                     ++counter;
26                     lockB.unlock();
27                     System.out.println("id = 1, After lockB.unlock()");
28                     lockA.unlock();
29                     System.out.println("id = 1, After lockA.unlock()");
30                 } else {
31                     System.out.println("id = 2, before lockB.lock()");
32                     lockB.lock();
33                     System.out.println("id = 2, After lockB.lock()");
34                     lockA.lock();
35                     System.out.println("id = 2, After lockA.lock()");
36                     ++counter;
37                     lockA.unlock();
38                     System.out.println("id = 2, After lockA.unlock()");
39                     lockB.unlock();
40                     System.out.println("id = 2, After lockB.unlock()");
41                 }
42             }
43             System.out.println("counter = " + counter);
44         }
45     }
46 
47     public static void main(String[] args) {
48         Thread t1 = new Thread(new MyRunnable(1));
49         Thread t2 = new Thread(new MyRunnable(2));
50         t1.start();
51         t2.start();
52         try {
53             t1.join();
54             t2.join();
55         } catch (InterruptedException e) {
56         }
57         System.out.println("Over");
58     }
59     /*    输出结果如下：
60         ... ...
61         id = 2, After lockA.lock()
62         id = 1, After lockA.lock()
63         id = 2, After lockA.unlock()
64         id = 2, After lockB.unlock()
65         id = 2, before lockB.lock()
66         id = 2, After lockB.lock() */

    注：从输出结果可以看出，死锁是在程序运行了一段时间之后发生的，这主要依赖于cpu的线程调度。结果死锁主要有以下两种方式，第一种是保证多个线程的加锁顺序是一致，如上例中两个线程的加锁顺序应该均为lockA.lock(); lockB.lock()，或者是反过来，总之保持一致就可以了。第二种方式是通过tryLock(int timeout)的超时机制来避免lock()方法的无限期等待，因此在出现死锁情况时，两个线程均有机会释放当前持有的锁。该方法只有在极为特殊的条件下发生，通常都是使用第一种方法来避免死锁。
    7)    读写锁：Java SE5的concurrent包中提供了读写锁(ReentrantReadWriteLock)，主要是用于提高读操作的并发效率。因为读锁(共享锁)可以同时加锁成功，当然如果有写锁存在，读锁就不能加锁成功，而写锁(独占锁)同一时间只能有一个加锁成功，如果有其他读锁或写锁存在，则本次写锁加锁将失败或是等待，所以可以认为读写锁是多读单写的。在C++中，PThread库提供了读写锁，其机制和Java也是一致的。
    5.    阻塞队列：该种队列非常适用于生产者和消费者模式，由于所有的同步操作均在阻塞队列中实现，因此生产者和消费者之间也都不再需要同步和协调操作了。BlockingQueue接口提供了以下一个常用的方法：
    offer         添加一个元素到队列的末尾并返回true，如果队列已满则立即返回false
    peek        返回队列的头元素，如果队列为空，则立即返回null
    poll        移除并返回队列的头元素，如果队列为空，则立即返回null
    put            添加一个元素到队列的末尾，如果队列已满，则阻塞
    take        移除并返回头元素，如果队列为空，则阻塞

 1     class Worker extends Thread {
 2         BlockingQueue<Integer> q;
 3 
 4         Worker(BlockingQueue<Integer> q) {
 5             this.q = q;
 6         }
 7 
 8         publicvoid run() {
 9             try {
10                 while (true) {
11                     Integer x = q.take();
12                     if (x == null)
13                         break;
14                     System.out.println(x);
15                 }
16             } catch (InterruptedException e) {
17                 System.out.println("Interrupted.");
18             }
19         }
20     }
21 
22     public static void main(String[] args) throws InterruptedException {
23         int capacity = 10;
24         BlockingQueue<Integer> queue = new ArrayBlockingQueue<Integer>(capacity);
25         Worker worker = new Worker(queue);
26         worker.start();
27 
28         for (int i = 0; i < 100; i++)
29             queue.put(i);
30         Thread.sleep(100);
31         worker.interrupt();
32         worker.join();
33     }

BlockingQueue的实现类提供了各种不同的功能，如上例中ArrayBlockingQueue，由于是基于Array实现的，所以队列是有上限的，LinkedBlockingQueue则是基于LinkedList作为底层容器，所以也就没有上限，还有PriorityBlockingQueue是支持优先级的队列。在C++的标准库中没有提供任何线程安全的容器和这种线程间同步用的容器，我在几年前基于Java BlockingQueue的启发，也自己实现了一个BlockingQueue对象，其机制和基本功能和Java的BlockingQueue几乎是相同的，然而我是利用C++的模板技巧来设计实现的，见如下部分声明代码：

 1     template <typename T, 
 2         typename ContainerT = FIFOContainer<T>,
 3         typename DestroyerT = DefaultDestroryer<T> >
 4     class BlockingQueue
 5     {
 6     public:
 7         typedef T                value_type;
 8         typedef ContainerT        container_type;
 9         typedef DestroyerT        destory_type;
10         
11     public:
12         BlockingQueue() {}
13         ~BlockingQueue() {}
14         bool put(T element) {}
15         //如果为空，get阻塞
16         T get(int timeout =-1) {}
17         T peek(int timeout =-1) {}
18         void cancel() {}
19         size_t size() {}
20     }
21 
22     template<typename T>
23     struct FIFOContainer : public list<T> {};
24 
25     template <typename T>
26     struct PriorityLevelComp 
27     {
28         booloperator()(T t1, T t2) const {
29             return t1.getLevel() < t2.getLevel();
30         }
31     };
32 
33     template<typename T>
34     struct PriorityContainer : public multiset<T, PriorityLevelComp<T>> {
35         typedef multiset<T, PriorityLevelComp<T>> Base;
36         typedef typename Base::value_type value_type;
37         typedef typename Base::reference  reference;
38         void push_back(value_type& v) {
39             insert(v);
40         }
41         void pop_front() {
42             if (!Base::empty())
43                  Base::erase(Base::begin());
44         }
45         reference front() {
46             return*Base::begin();
47         }
48     };

    和Java相比，由于使用的是模板技术，而不是基于接口的多态，因此不存在虚表定位的步骤，其效率较高。再有就是充分利用C++模板中提供的模参特化和部分特化的技巧，可以任意的替换底层容器。我们现在正在开发的项目中就利用了这样的技巧，我们需要BlockingQueue能够支持优先级(上面代码已经支持)和容器元素分组合并的功能，比如EDT(Swing的事件分发线程)中，如果队列中存在同样区域的repaint事件，则将他们合并成一个事件以提高系统的整体运行效率。因此将模参中的缺省容器替换为我们针对该功能实现的一个容器(继承自上例中的PriorityContainer)，再提供一个比较分组的仿函数类就实现了所有的功能需求。
    6.    线程安全的集合：
    java.util.concurrent包中提供了一组Map、Set和Queue的高效并发访问的实现容器，如ConcurrentHashMap、ConcurrentSkipListMap和ConcurrentLinkedQueue等。
    如果不通过这些JDK自带的同步队列，那么为了保证线程安全，必须在操作队列的外部加锁保护。由于concurrent包中提供的并发容器是在实现的内部完成了同步操作，因此可以做到粒度更细的加锁机制，从而避免了外部实现时锁这个容器的现象了。比如HashMap，其内部是由多个哈希桶来支撑的，当两次操作作用于不同的哈希桶时，事实上是不需要同步加锁的，这样就能带来并发效率上的提升，以下代码测试了使用不同并发容器的执行效率，可以看到ConcurrentHashMap要明显优于另外两种同步容器。

 1     public static void main(String[] args) throws InterruptedException {
 2         Map<Integer, Integer> map = Collections
 3                 .synchronizedMap(new HashMap<Integer, Integer>());
 4         long start = System.currentTimeMillis();
 5         PutWorker worker = new PutWorker(map);
 6         worker.start();
 7         GetWorker getWorker = new GetWorker(map);
 8         getWorker.start();
 9         worker.join();
10         getWorker.join();
11         long elapse = System.currentTimeMillis() - start;
12         System.out.println("Collections.synchronizedSet: " + elapse);
13 
14         map = new ConcurrentHashMap<Integer, Integer>();
15         start = System.currentTimeMillis();
16         worker = new PutWorker(map);
17         worker.start();
18         getWorker = new GetWorker(map);
19         getWorker.start();
20         worker.join();
21         getWorker.join();
22         elapse = System.currentTimeMillis() - start;
23         System.out.println("concurrent.ConcurrentHashMap: " + elapse);
24 
25         map = new Hashtable<Integer, Integer>();
26         start = System.currentTimeMillis();
27         worker = new PutWorker(map);
28         worker.start();
29         getWorker = new GetWorker(map);
30         getWorker.start();
31         worker.join();
32         getWorker.join();
33         elapse = System.currentTimeMillis() - start;
34         System.out.println("Hashtable: " + elapse);
35     }
36 
37     class GetWorker extends Thread {
38         private Map<Integer, Integer> map;
39 
40         GetWorker(Map<Integer, Integer> map) {
41             this.map = map;
42         }
43 
44         public void run() {
45             for (int i = 0; i < 10000000; ++i) {
46                 if (map.get(i) != null)
47                     map.remove(i);
48             }
49         }
50     }
51 
52     class PutWorker extends Thread {
53         private Map<Integer, Integer> map;
54 
55         PutWorker(Map<Integer, Integer> map) {
56             this.map = map;
57         }
58 
59         publicvoid run() {
60             for (int i = 0; i < 10000000; ++i)
61                 Mac.put(i, i);
62         }
63     }  
64     /*    输出结果如下：
65         Collections.synchronizedSet: 47766
66         concurrent.ConcurrentHashMap: 28657
67         Hashtable: 48000    */

    7.    Callable和Future：
    Callable和Runnable接口相比，同为线程执行的接口，Runnable表现为纯异步的执行方式，调用者无法通过Runnable实现类获取run方法的返回值。Callable实现这一机制是通过和Future接口的协作完成的，见如下代码：
    注：Callable的call方法等同于Runnable的run方法。

 1     public static void main(String[] args) {
 2         Callable<Integer> c = new SleepCallable();
 3         // 这里的FutureTask<V>类分别实现了Runnable和Future<V>接口
 4         // 他将后续的操作中分别扮演不同的角色。
 5         FutureTask<Integer> sleepTask = new FutureTask<Integer>(c);
 6         // 此时的FutureTask扮演着Runnable的角色
 7         Thread t = new Thread(sleepTask);
 8         t.start();
 9         // 此时的FutureTask扮演了Future<V>的角色。
10         try {
11             // 该方法将阻塞直到线程执行完毕正常返回，如果是异常中断，
12             // 则被后面的异常捕捉逻辑获取。
13             Integer ii = sleepTask.get();
14             System.out.println("The return value is " + ii);
15         } catch (InterruptedException e) {
16             e.printStackTrace();
17         } catch (ExecutionException e) {
18             e.printStackTrace();
19         }
20     }
21 
22     class SleepCallable implements Callable<Integer> {
23         @Override
24         public Integer call() throws Exception {
25             Thread.sleep(1000 * 10);
26             return new Integer(100);
27         }
28     }
29     /*    输出结果为：
30         The return value is 100 */

    8.    执行器：
    可以理解为线程池，主要用于一些高并发的系统程序中，如各种服务器。如果面对每次下发的任务都启动一个线程，很快执行完毕后就立刻销毁该线程，或者是面对大量的并发Socket链接，针对每次连接都创建一个线程来提供服务。在高并发和任务频繁产生的压力下，这样的设计会使JVM和操作系统都不堪重负，因此我们推荐使用线程池的机制来缓解这一压力。基本理念是m个线程服务n个连接或任务，其中m >= 1, n >= 0。m和n之间的值没有必然的关系。Java SE5的concurrent包中提供了以下几种执行器的实现类，他们都是通过Executor不同的静态工厂方法创建出来的，他们的共同接口为ThreadPoolExecutor：
    newCachedThreadPool 必要时创建新线程，空闲线程会被保留60秒
    newFixedThreadPool     该池包含固定数量的线程，空闲一定会被保留，如果任
                                务数量多线程数量，多出的任务被放在队列中缓存，直
                                到有空闲线程可以执行为止
    newSingleThreadExecutor 只有一个线程的newFixedThreadPool
    newScheduledThreadPool   java.util.Timer内部是由一个线程来维护并执行Timer
                                内部的多个TimerTask的，而该执行器类可以看做是由
                                多个线程来完成这些TimerTask的，这样并发性更好。
    ThreadPoolExecutor提交任务主要是通过以下3个方法来完成的，如下：
    Future<?> submit(Runnable task)
    该返回的Future<?>只能用于isDone()调用来判断任务是否执行完毕，而无法通过get方法获取任务执行后的返回值，因为他将始终为null。
    Future<T> submit(Runnable task,T result)
    该任务的返回值为result对象。
    Future<T> submit(Callable task);
    返回的Future对象可以通过get方法获取Callable执行后的返回值对象。
    以下两个方法用于关闭执行器
    shutdown()
    该方法被执行后，执行器将不会再接收新的任务，在执行完已经提交的全部任务之后，执行器将退出所有工作者线程。
    shutdownNow();
    和shutdown最主要的区别是该函数并不执行尚未执行的任务了(缓存在队列中的)，而是试图第一个时间关闭整个执行器。

 1     public static void main(String[] args) {
 2         ExecutorService e = Executors.newFixedThreadPool(10);
 3         try {
 4             for (int i = 0; i < 1000; ++i) {
 5                 Future<Integer> f = e.submit(new IncrementCallable());
 6                 System.out.println(f.get());
 7             }
 8         } catch (InterruptedException e1) {
 9             e1.printStackTrace();
10         } catch (ExecutionException e1) {
11             e1.printStackTrace();
12         }
13         e.shutdown();
14     }
15 
16     class IncrementCallable implements Callable<Integer> {
17         private static AtomicInteger ai = new AtomicInteger(0);
18 
19         @Override
20         public Integer call() throws Exception {
21             // AtomicInteger可以保证Integer的操作为原子性的。
22             return ai.incrementAndGet();
23         }
24     }

    在C++的开发中，win32的API提供了线程池的函数，但是在实际的开发中很少有人会去使用操作系统自带的线程池函数，而是重新开发一套。我在大约10年前就曾经自己写了一个线程池的工具类。随着技术的积累，对面向对象理解的不断深入，后来又多次重新开发了线程池作为自己的工具类，最新一版和Java的实现极为相似，当然这里的主要原因是我参照了Java的设计技巧。
    9.    同步器：
    1)    CountDownLatch(倒计时栓锁)，让一个线程集等待直到计数变为0。该锁只能使用一次，一旦计数为0就不能重用了。

 1 public static void main(String[] args) {
 2         CountDownLatch latch = new CountDownLatch(10);
 3         for (int i = 0; i < 10; ++i)
 4             new Thread(new MyRunnable(latch)).start();
 5         try {
 6             System.out.println("Before waiting.");
 7             // 在栓锁latch被减到0之前，函数将挂起主函数。
 8             // 直到latch的值为0，函数将正常返回。
 9             latch.await();
10         } catch (InterruptedException e) {
11         }
12         System.out.println("All threads exit.");
13     }
14 
15     class MyRunnable implements Runnable {
16         private CountDownLatch latch;
17 
18         public MyRunnable(CountDownLatch c) {
19             latch = c;
20         }
21 
22         public void run() {
23             try {
24                 Thread.sleep(1000);
25             } catch (InterruptedException e) {
26             } finally {
27                 // 这里每一个线程在退出时都会将栓锁的值减一
28                 // 直到最后一个退出时将栓锁的值减到0
29                 latch.countDown();
30             }
31         }
32     }

上例中如果没有使用CountDownLatch，则需要改为下面的实现方式：

 1     public static void main(String[] args) {
 2         Thread[] tArray = new Thread[10];
 3         for (int i = 0; i < 10; ++i) {
 4             tArray[i] = new Thread(new MyRunnable());
 5             tArray[i].start();
 6         }
 7         try {
 8             System.out.println("Before waiting.");
 9             for (int i = 0; i < 10; ++i)
10                 tArray[i].join();
11         } catch (InterruptedException e) {
12         }
13         System.out.println("All threads exit.");
14     }
15 
16     class MyRunnable implements Runnable {
17         public void run() {
18             try {
19                 Thread.sleep(1000);
20             } catch (InterruptedException e) {
21             }
22         }
23     }

2) CyclicBarrier(障栅)：和CountDownLatch不同，障栅对象可以循环使用。
注：当多个线程同时执行到障栅的await函数时，如果其中一个线程的await操作是基于超时调用的，那么此时超时返回，这样将会导致整个障栅被破坏，其他的处于await等待的线程将会受到一个BrokenBarrierException异常。

 1     public static void main(String[] args) {
 2         CyclicBarrier cb = new CyclicBarrier(3, new BarAction());
 3         System.out.println("Starting");
 4         new MyThread(cb, "A");
 5         new MyThread(cb, "B");
 6         new MyThread(cb, "C");
 7     }
 8 
 9     class MyThread implements Runnable {
10         CyclicBarrier cbar;
11         String name;
12 
13         MyThread(CyclicBarrier c, String n) {
14             cbar = c;
15             name = n;
16             new Thread(this).start();
17         }
18 
19         public void run() {
20             System.out.println(name);
21             try {
22                 // 通常这里执行一些批量计算的前期准备工作，多个
23                 // 线程并发执行，其中每一个线程都只是执行自己的
24                 // 区域，比如10个线程执行100000000个元素的预计
25                 // 算，那么每个线程将得到其中的十分之一来处理。
26                 // 这里必须要等到所有预计算处理都完毕后才能执行
27                 // 后面的批量操作。
28                 cbar.await();
29                 // 穿过CyclicBarrier的await函数，表示此时所有线程
30                 // 各自负责的预计算均已经处理完毕，可以从此点开始
31                 // 继续后续的计算任务了。h
32             } catch (BrokenBarrierException exc) {
33             } catch (InterruptedException exc) {
34             }
35         }
36     }
37 
38     class BarAction implements Runnable {
39         // 该函数只是在最后一个障栅到达时，被这个最后到达的线程执行一次(仅一次)。
40         public void run() {
41             System.out.println("Barrier Reached!");
42         }
43     }

3) Exchanger(交换器)：当两个线程在同一数据缓冲区的两个实例上工作的时候，就可以使用交换器，典型的情况是，一个线程向缓冲区填入数据，另一个线程消耗这些数据。当他们都完成以后相互交换缓冲区。

 1     public static void main(String[] args) {
 2         Exchanger<String> exgr = new Exchanger<String>();
 3         new UseString(exgr);
 4         new MakeString(exgr);
 5     }
 6 
 7     class MakeString implements Runnable {
 8         Exchanger<String> ex;
 9         String str;
10 
11         MakeString(Exchanger<String> c) {
12             ex = c;
13             str = new String();
14             new Thread(this).start();
15         }
16 
17         public void run() {
18             char ch = 'A';
19             for (int i = 0; i < 3; i++) {
20                 for (int j = 0; j < 5; j++)
21                     str += (char) ch++;
22                 try {
23                     System.out.println("[Producer] Before exchange.");
24                     str = ex.exchange(str);
25                     System.out.println("[Producer] After exchange.");
26                 } catch (InterruptedException exc) {
27                     System.out.println(exc);
28                 }
29             }
30         }
31     }
32 
33     class UseString implements Runnable {
34         Exchanger<String> ex;
35         String str;
36 
37         UseString(Exchanger<String> c) {
38             ex = c;
39             new Thread(this).start();
40         }
41 
42         public void run() {
43             for (int i = 0; i < 3; i++) {
44                 try {
45                     System.out.println("[Consumer] Before exchange.");
46                     str = ex.exchange(new String());
47                     System.out.println("[Consumer] After exchange.");
48                     System.out.println("Got: " + str);
49                 } catch (InterruptedException exc) {
50                     System.out.println(exc);
51                 }
52             }
53         }
54     }

C++的标准中没有提供类似的库或者框架，如果确实需要，可以考虑Intel公司开发的TBB(开源)库，其中提供了大量的基于并行计算所需要的容器、堆、算法和同步工具，目前支持windows的VC和Linux/Unix平台的gcc编译器。

    10.    原子锁：
    之前提到过由于++、--、+=等操作看似只是一条Java语句，然而在实际执行时是由多条指令来完成的，这一点同样适用于C/C++或其他的编程语言，如C#等。如果想为类似这样的语句提供线程安全的保证，最简单的方法就是使用ReentrantLock或synchronized关键字加以保证，然而这样做事实证明是非常低效的，因此windows很早之前就提供了更为高效的C接口API--内部锁(自旋锁和free-lock机制的基础)。Java SE5 的concurrent包中也提供了同样的一组classes，针对不同类型提供了不同的class，如AtomicInteger、AtomicLong和AtomicBoolean。他们的使用是非常简单的，当然如果希望基于他们完成自旋锁或者free-lock队列的实现，那还是需要一些基础和经验的。下面还是介绍一下他们的基础应用吧，毕竟这样的应用场景更多，更易于掌握，下面是以AtomicInteger为例，其他类型可以效仿。
    1)    普通包装类的原子对象
    void set(int newValue)    //原子的赋予新值，对于int类型而言，在32bits的操作系统中复制和读取操作通常本身就是原子性的。
    int getAndSet(int newValue) //原子性的设置新值(newValue)，同时返回原值
    boolean compareAndSet(int expect, int update)
    //类似于非原子的如下操作(this.value为当前对象缓存的Integer的当前值)
    //if (this.value == expected) {
    //    this.value = update;
    //    return true;
    //}
    //return false;
    int getAndIncrement() //递增一同时返回原来的值，类似于非原子的i++
    int getAndDecrement() //递减一同时返回原来的值，类似于非原子的i--
    int getAndAdd(int delta) //增加delta之后返回更新后的结果
    int incrementAndGet() //递增一同时返回更新后的结果，类似于非原子的++i
    int decrementAndGet() //递减一同时返回更新后的结果，类似于非原子的--i
    int addAndGet(int delta) //原子性的加delta值，同时返回更新后的值,类似于非原子的+= delta
    以上针对每个函数在注释中提到的操作都是原子性的，所有的操作不管是由多少条原始指令构成，他们都会原子的完成，中间不会被打断。即便在执行中当前线程被JVM调度离开，那么被调度的线程如果也是执行相同的逻辑代码段(原子性的操作相同的对象)，该线程将会被阻塞同时让出cpu的执行权，直到该线程原子性的执行完毕所有的原子指令。注：如果之前被调度到的线程不是原子操作，而是直接操作，那么将不会阻塞，同时也将会破坏第一个线程的中间结果。
    2)    包装类数组的原子对象：concurrent包中还提供了一组包装类数组的原子操作，其概念和原理和普通包装类是一致的，接口也基本一样，只是在每个接口函数中添加了一个数组下标的int型参数，见如下函数声明：
    void set(int i, int newValue)
    int getAndSet(int i, int newValue)
    boolean compareAndSet(int i, int expect, int update)
    int getAndIncrement(int i)
    int getAndDecrement(int i)
    int getAndAdd(int i, int delta)
    int incrementAndGet(int i)
    int decrementAndGet(int i)
    int addAndGet(int i, int delta)
    一个重要的差别就是如果i的值超过数组的长度，仍然会抛出IndexOutOfBoundsException异常。

    11.    线程与Swing：
    Swing中提供的界面组件操作均为线程不安全操作，这里提供两条在Swing中如何更好的利用工作者线程完成较长时间操作的基本原则：
        1)    如果一个动作需要花费很长时间，在一个独立的工作者线程中做这件事不要在事件分配线程中做；
        2)    除了事件分配线程，不要在任何线程中接触Swing组件。
    工作者线程通过调用EventQueue的invokeLater和invokeAndWait方法将需要修改Swing组件的操作提交给EDT(Swing的事件分发线程)来完成，其中invokeLater方法将立即放回，提交的任务会被EDT异步的执行，而invokeAndWait则会同步的等待提交的任务被处理，这一点非常类似于Windows API中的PostMessage和SendMessage之间的区别，见如下代码：

1     public void test() {
2         EventQueue.invokeLater(new Runnable() {
3             //这种方法要求所有针对Swing组件的操作都要封装到Runnable中，
4             //以供EDT线程执行。
5             public void run() {
6                 Label.setText("....");
7             }
8         }
9     }

C++中，这里可以以Windows的MFC为例，MFC中的组件如果被多个线程访问的话，将会导致界面定死等现象，为了避免此类问题的发生，通常是工作者线程通过向界面主线程投递自定义消息的方式(PostMessage)，让主界面的消息处理函数来自行在主线程中完成界面的操作。