多线程之旅之四——浅谈内存模型和用户态同步机制

用户态下有两种同步结构的

volatile construct: 在简单数据类型上原子性的读或者写操作

interlocked construct:在简单数据类型上原子性的读和写操作

（在这里还是要再啰嗦一句，记住只有操作系统才有办法阻止一个线程执行，通过无论是I/O中断还是线程阻塞等方式。)

为了达到原子性操作，上面两种结构都需要内存地址正确对齐，简单来说就是对变量有要求，需要变量所在内存地址分别是1、2和4的倍数。正常情况下CLR中的变量类型都是字段对齐的，所以这里不展开来说。

我想还是从非常重要的interlocked类开始说起。

System.Threading.Interlocked

Interlocked类中的每个方法都执行一次原子性的读取以及写入操作，其中public static Int32 Increment(ref Int32 location)方法是最常用到的方法，后面我在自定一个混合结构锁的时候就会用到。

下面是这个类的方法的签名，注释部分说明的是相对应的同步方法。

public static class Interlocked {  
   // return (++location)  
   public static Int32 Increment(ref Int32 location);  
  
   // return (--location)  
   public static Int32 Decrement(ref Int32 location);  
  
   // return (location1 += value)  
   public static Int32 Add(ref Int32 location1, Int32 value);  
  
   // Int32 old = location1; location1 = value; return old;  
   public static Int32 Exchange(ref Int32 location1, Int32 value);  
  
   // Int32 old = location1;  
   // if (location1 == comparand) location1 = value; 
   // return old;  
   public static Int32 CompareExchange(ref Int32 location1,   
      Int32 value, Int32 comparand);  
   ...  
}

自己实现简单的Spin Lock,不阻塞线程，但是同时又只有一个线程可以进入临界域操作。那么其他的线程干什么了呢？肯定没有阻塞，因为我们没有使用到内核对象，但是为了不让他们干扰到我们工作，只能让它们在“原地打转”了。

假如有多个线程调用了Enter方法，那么只有一个线程能满足条件进入while的内部，其他线程都因为不满足条件而在不断的判断while条件。

exchange方法确保第一个调用的线程将m_ResourceInUse变为1，并且原始值为0.而其他线程将会使得m_ResourceInUse从1变为1，也就是原始值为1，不满足条件。

class SimpleSpinLock { 
   private Int32 m_ResourceInUse; // 0=false (default), 1=true 
 
   public void Enter() { 
      // Set the resource to in-use and if this thread  
      while (Interlocked.Exchange(ref m_ResourceInUse, 1) != 0) { 

      } 
   } 
 
   public void Leave() { 
      Thread.VolatileWrite(ref m_ResourceInUse, 0); 
   } 
}

如何使用这个类呢？很简单

public sealed class SomeResource { 
   private SimpleSpinLock m_sl = new SimpleSpinLock(); 
 
   public void AccessResource() { 
      m_sl.Enter(); 
      // Only one thread at a time can get in here to access the resource... 
      m_sl.Leave(); 
   } 
}

exchange是原子判断true和false的一个常用办法。

原子性

写到这里我觉得还可能有人对原子性不清楚，举个例子来说吧，非常经典以及常用的++操作符号

int a = 0;
a++;

当编译器把这行C#语句编译成汇编代码的时候,将会包含多条指令，如:

MOV EAX, [a]
INC EAX
MOV [a], EAX

第一条指令获得变量a的地址，第二条指令把以这个地址开头的接下来的4个字节复制到寄存器EAX中。接下来的汇编指令将递增EAX中的值，最后将递增后的值从EAX复制回a指向的地址。

遗憾的是，我们从源代码中根本无法看到++运算符中所包含的这些步骤。如果使用多个变量，那么就可以更清楚的看到这些步骤。事实上，这些步骤类似于将代码写成下面这样:

int  a  = 0; 
int  tmp  =   a;
tmp++;
a  = tmp;

虽然加载寄存器和保存寄存器等指令本身都是原子的，但将加载、递增以及保存这三条指令放在一起组成的操作组合却就不再是原子的了

任何需要多条汇编指令的运算都是非原子的，因此++和--等操作符都是非原子的。这意味着我们需要采取额外的步骤来保证并发的安全性，下面我们来具体说下:

假设有三个线程t1、t2、t3同时编译后生成下面的汇编代码:

注意，纵向是时间线，#n表示当前时候a的值。

我们的原本想法应该是这样执行:

但是由于抢占式操作系统线程的推进是不可预测的，真正执行的时候可能是这样

在上面的执行流程中，t1首先更新为1,然后t2更新为2.此时，从系统中其他线程的角度来看，似乎一切都正常。

然后，此时t3被唤醒继续执行，它将覆盖t1和t2的执行结果，重新将a的值设置为1.

这是一个典型的数据竞争问题，之所以称为“竞争”，是因为代码执行的正确性完全依赖于多个线程之间的竞争结果。每个线程都试图最先执行完代码，并且根据哪个线程最先执行完成的不同，会导致不同的结果。也就是相同的源代码，不同的执行结果。

Interlock的Increment帮我们解决了这个问题，它能保证原子的递增。

下面我们用它来实现简单的Hybird Lock

class SimpleHybridLock : IDisposable { 
   private Int32 m_waiters = 0; 

   // The AutoResetEvent is the primitive kernel-mode construct 
   private AutoResetEvent m_waiterLock = new AutoResetEvent(false); 
 
   public void Enter() { 

      if (Interlocked.Increment(ref m_waiters) == 1) //what will happen if we use m_waiters++ in this place?
         return; //return means we enter critical region// Another thread is waiting. There is contention, block this thread 
      m_waiterLock.WaitOne();  // Bad performance hit here 
      // When WaitOne returns, this thread now has the lock 
   } 
 
   public void Leave() { 
      // This thread is releasing the lock 
      if (Interlocked.Decrement(ref m_waiters) == 0) 
         return; // No other threads are blocked, just return 
 
      // Other threads are blocked, wake 1 of them 
      m_waiterLock.Set();  // Bad performance hit here 
   } 
 
   public void Dispose() { m_waiterLock.Dispose(); } 
}

我们用一个int私有字段来计数，确保只有一个线程调用该方法的时候不会调用到非常影响性能的内核对象。只有多个线程并发的访问这个方法的时候，才会初始化内核对象，阻塞线程。

我们可以给这个锁加入更多的功能，这时我们需要保存更多的信息，也就需要更多的字段，比如说保存哪个线程拥有这个锁，以及它拥有了多少次。在多个线程并发访问的时候，我们也可以推迟一段时间再创建内核对象，可以加入spin lock先自旋一段时间。

internal sealed class AnotherHybridLock : IDisposable { 
   // The Int32 is used by the primitive user-mode constructs (Interlocked methods) 
   private Int32 m_waiters = 0; 
 
   // The AutoResetEvent is the primitive kernel-mode construct 
   private AutoResetEvent m_waiterLock = new AutoResetEvent(false); 
 
   // This field controls spinning in an effort to improve performance 
   private Int32 m_spincount = 4000;   // Arbitrarily chosen count 
 
   // These fields indicate which thread owns the lock and how many times it owns it 
   private Int32 m_owningThreadId = 0, m_recursion = 0; 
 
   public void Enter() { 
      // If calling thread already owns the lock, increment recursion count and return 
      Int32 threadId = Thread.CurrentThread.ManagedThreadId; 
      if (threadId == m_owningThreadId) { m_recursion++; return; } 
 
      // The calling thread doesn't own the lock, try to get it 
      SpinWait spinwait = new SpinWait(); 
      for (Int32 spinCount = 0; spinCount < m_spincount; spinCount++) { 
         // If the lock was free, this thread got it; set some state and return 
         if (Interlocked.CompareExchange(ref m_waiters, 1, 0) == 0) goto GotLock; 
 
         // Black magic: give other threads a chance to run  
         // in hopes that the lock will be released 
         spinwait.SpinOnce(); 
      } 
 
      // Spinning is over and the lock was still not obtained, try one more time 
      if (Interlocked.Increment(ref m_waiters) > 1) { 
         // Other threads are blocked and this thread must block too 
         m_waiterLock.WaitOne(); // Wait for the lock; performance hit 
         // When this thread wakes, it owns the lock; set some state and return 
      } 
 
   GotLock: 
      // When a thread gets the lock, we record its ID and  
      // indicate that the thread owns the lock once 
      m_owningThreadId = threadId; m_recursion = 1; 
   } 
 
   public void Leave() { 
      // If the calling thread doesn't own the lock, there is a bug 
      Int32 threadId = Thread.CurrentThread.ManagedThreadId; 
      if (threadId != m_owningThreadId) 
         throw new SynchronizationLockException("Lock not owned by calling thread"); 
 
      // Decrement the recursion count. If this thread still owns the lock, just return 
      if (--m_recursion > 0) return; 
 
      m_owningThreadId = 0;   // No thread owns the lock now 
 
      // If no other threads are blocked, just return 
      if (Interlocked.Decrement(ref m_waiters) == 0)  
         return; 
 
      // Other threads are blocked, wake 1 of them 
      m_waiterLock.Set();     // Bad performance hit here 
   } 
 
   public void Dispose() { m_waiterLock.Dispose(); } 
}

当然锁变复杂了，性能也会有相应的降低。有所得有所失去。

Sync block

堆上的每个对象都可以关联一个叫做Sync block（同步块）的数据结构。同步块包含字段，这些字段和上面我们实现的锁中的字段的作用是差不多的。具体地说，它为一个内核对象、拥有线程的ID、递归计数器、等待线程的计数提供了保存的地方。

Type类型对象和普通对象一样都在托管堆上，都有指向同步块的指针。锁住任一个普通对象和锁住type对象是没有什么区别的，反正用到的只是同步块。用了不同的同步块会创建不同的临界域，不同的临界域当然就没有什么互斥的概念了。所以lock typeof(object)其实也只是说“兄弟，我要用到你所指向的同步块来保存我同步时所必须的数据了”。

照例配图一张，要不光看我文字描述不太容易懂:

因此同步块干啥子的？用来保存数据的呗……

如同上面我们自己实现的混合结构锁一样，monitor、mutex和event就保存了0,1还有一点其他数据，比如说什么线程ID的，用来实现允许递归;Semaphore就保存了1,2,3,4,5……等数据。
当然，同步块也不是一开始就上的，上面这张图隐藏了点信息。就是其实那个指向同步块的指针有2个指针大小的内存，还保存着hashcode的值还有一些其他东西。如果块内存不足以保存这些信息，那么才会为这个对象分配一个共享内存池中的同步块。这就是Object Header Inflation现象。

懂得相同之处了，再来理解为什么锁type类型危险的，究其原因就是type能被很多地方访问，甚至能跨appdomain,这就很有可能你莫名其妙就和另一个appdomain中的锁用到同一个同步块了。同样情况的类型还有于AppDomain无关的反射类型，比如说啥子MemberInfo之类的。

为了说明临界域互斥的问题，我写了一段代码，创建了2个不同的临界域。

其中[MethodImplAttribute(MethodImplOptions.Synchronized)] 编译后就相当于lock(this)

class Program
    {
        static void Main(string[] args)
        {
            var syncTest = new SyncTest();
            Thread t1 = new Thread(syncTest.LongSyncMethod); // critical region 1
            t1.Start();

            Thread t2 = new Thread(syncTest.NoSyncMethod);
            t2.Start();

            Thread t3 = new Thread(syncTest.LongSyncMethod);// critical region 1 
            t3.Start();

            Thread t4 = new Thread(syncTest.NoSyncMethod);
            t4.Start();

            Thread t5 = new Thread(syncTest.NoSyncMethod);
            t5.Start();

            Thread t6 = new Thread(syncTest.SyncMethodUsingPrivateObject);// critical region 2
            t6.Start();

            Thread t7 = new Thread(syncTest.SyncMethodUsingPrivateObject);// critical region 2
            t7.Start();
        }
    }




    class SyncTest
    {
        private object _lock = new object();

        [MethodImplAttribute(MethodImplOptions.Synchronized)]
        public void LongSyncMethod()
        {
            Console.WriteLine("being asleep");
            Thread.Sleep(10000);
        }


        public void NoSyncMethod()
        {
            Console.WriteLine("do sth");

        }
        
        public void SyncMethodUsingPrivateObject()
        {
            lock (_lock)
            {
                Console.WriteLine("another critical section");
                Thread.Sleep(5000);
            }

        }
    }

很多对概念不清楚的人都以为lock(this)后会把整个对象都锁住，什么方法都用不了。好一点的会认为同步方法用不了。懂得原因以后，就会明白lock(this)并没有什么特别的，只是通过this对象创建了一个临界域，我们同样可以lock其他对象创建不同的临界域，不同的临界域并不互斥。

用Monitor来实现阻塞列队:

Monitor也是一种结合了自旋和内核对象的混合构造锁。我们通常会用Lock关键字去使用它，lock关键字保证了我们能按照正确的模式去使用Monitor类。

1.通过临时变量保证了进入和释放的都是同一个对象，就算你在Lock里面修改了所对象也一样。

2.保证锁只要获取了就能释放。

下面是.NET4以后Lock语法糖编译后的等价代码

bool acquired = false;
object tmp = listLock;
try
{
   Monitor.Enter(tmp, ref acquired);
   list.Add("item");
} 
finally {
   if (acquired)
   {
        Monitor.Release(tmp);
    } 
}

在《多线程之旅之三》中我们用两个内核对象实现了有界阻塞列队，主要的开销就在于每次入队的时候两个内核对象之间发生的切换，下面我们尝试用混合锁Monitor来实现相应的数据结构。享受混合锁给我们带来的好处。

    public class BlockingQueue<T>
    {
        private Queue<T> m_queue = new Queue<T>();
        private int m_waitingConsumers = 0;
        public int Count
        {
            get
            {
                lock (m_queue)
                    return m_queue.Count;
            }
        }
        public void Clear()
        {
            lock (m_queue)
                m_queue.Clear();
        }

        public bool Contains(T item)
        {
            lock (m_queue)
                return m_queue.Contains(item);
        }
        public void Enqueue(T item)
        {
            lock (m_queue)
            {
                m_queue.Enqueue(item);
                // Wake   consumers  waiting  for  a  new  element. 
                if (m_waitingConsumers > 0)

                    Monitor.Pulse(m_queue);
            }
        }

        public T Dequeue()
        {
            lock (m_queue)
            {
                while (m_queue.Count == 0)
                {
                    //Queue  is  empty,  wait  until  en  element  arrives. 644  Chapter 12:  Parallel Containers 
                    m_waitingConsumers++;
                    try
                    {
                        Monitor.Wait(m_queue);
                    }
                    finally
                    {
                        m_waitingConsumers--;
                    }
                }
                return m_queue.Dequeue();

            }
        }

        public T Peek()
        {
            lock (m_queue)
                return m_queue.Peek();
        }
    }

1.多线程之旅——从概念开始

2.多线程之旅二——线程

3.多线程之旅之三——Windows内核对象同步机制

4.多线程之旅之四——浅谈内存模型和用户态同步机制

最后，如果你觉得文章还不错，请点击右下角的推荐，谢谢！