浅析Windows操作系统中的线程局部存储（TLS）机制

多线程是编程中比较容易出问题的一块儿，究其原因，是因为多线程程序往往违背了高级语言屏蔽系统底层细节的设想，而需要程序员对于操作系统的调用机制有深入了解。会用高级语言写算法程序->编写多线程程序可能是一个比较困难的跨越。当然，对于多线程程序来说，即使不掌握操作系统的细节，如果学过一些操作系统的通用原理，可能也是可以勉强写出程序来的，但是对程序的控制的和理解可能就不那么过硬。假如多线程程序又包含了多模块（DLL动态加载），则如果不能理解内部的机制，写出的程序可能就是一场灾难。

在应对多模块对DLL的调用时，Windows提供了TLS（Thread Local Storage，线程局部存储机制）。虽然在不调用DLL的应用程序中依然可以使用TLS，但是操作系统设计者并不建议过多使用TLS，而在普通应用程序中，应尽量避免TLS的使用。但对于DLL来说，TLS是一种对静态和全局变量的替代。以下内容大部分援引或简述自《Windows核心编程》（Windows Via C/C++），更多实现细节参考了MATT PIETREK的《Windows 95 System Programming Secrets》，也会偶有自己的评论，当然更加详尽的关于Windows多线程编程的内容，可以参阅操作系统原理、Windows内核讲解的书籍。

线程局部存储提供一种将数据绑定到特定线程中的机制。通过这种机制，可以将一些原来被线程共享的全局变量（由于线程在操作系统中没有自己的内存空间而与同一进程中的其他线程共享空间，所以对于线程来说，全局变量不是线程私有的）转化为线程私有，进而让某些由于编写时间比较早未考虑多线程并发又使用了过多全局变量的程序有了可以支持多线程的方式。当然，TLS不只是针对上述情况的。

比如，早期微软的C语言运行时库就是为单线程编写的，里面的实现用到了很多全局和静态变量。而在后期维护的过程中，为了支持多线程，就大量使用了TLS。

关于全局变量的使用，Windows核心编程中作者曾这样写到：

"In my own software projects, I avoid global variables as much as possible. If your application uses global and static variables, I strongly suggest that you examine each variable and investigate the possibilities for changing it to a stack-based variable. This effort can save you an enormous amount of time if you decide to add threads to your application, and even single-threaded applications can benefit."

大意就是应尽量避免使用全局变量，如果使用到了，应尽量将其改变为栈中存储的变量。这样的努力可以在你试图加入多线程时节省你很多时间，即使单线程程序也会因此而获益。

TLS分两种：静态的和动态的。他们可以同时使用在普通应用程序或DLL中。但其对DLL来说意义更大：因为DLL并不知道调用程序的内部结构。在普通应用程序中一个线程应尽量使用局部变量。

动态TLS：

上图显示了在操作系统的内存空间中，每个线程动态TLS的分配情况图。每个线程的局部变量的分配情况对应数组中的一个bit，值为FREE或者INUSE（可能分别对应0和1）。它对应相应下标（index）的动态存储结构（slot）的分配情况。TLS_MINIMUM_AVAILABLE表示系统能承载slot的最大数目，在Windows系统中为64。除了bit位标志数组来标记slot的存储情况，还有实际存储slot的PVOID（应该是空指针）型数组，其成员个数与bit数组相同，且成员一一对应。关于bit flag数组和slot数组的具体实现细节《windows核心编程》并没有过多提到，我参考了下MATT PIETREK的《Windows 95 System Programming Secrets》，内容援引如下：

“

THE WINDOWS 95 PROCESS DATABASE (PDB)
In Windows 95, each process database is a block of memory allocated from
the KERNEL32 shared memory heap. KERNEL32 often uses the acronym
PDB instead of the longer term "process database." Unfortunately, in Win16,
PDB is a synonym for the DOS PSP that all programs have. Is this confusing?
Yes! For the purposes of this chapter, I'll use PDB in the KERNEL32 sense of
the term. Each PDB is considered to be a KERNEL32 object as evidenced by
the value 5 (K32OBJ_PROCESS) in the first DWORD of the structure. The
PROCDB.H file from the WIN32WLK program gives a C-style view of the
PDB structure.

....
88h DWORD tlsInUseBits1
These 32 bits represent the status of the lowest 32 TLS (Thread Local Storage)
indexes. If a bit is set, the TLS index is in use. Each successive TLS index is
represented by successively greater bit values; for example:
TLSindex:0 = 0x00000001
TLSindex:l = 0x00000002
TLSindex:2 = 0x00000004
Thread local storage is discussed in detail in the "Thread Local Storage"
section later in this chapter.
8Ch DWORD tlsInUseBits2
This DWORD represents the status of TLS indices 32 through 63. See the
previous field description (88h) for more information.

...

THE THREAD DATABASE
The thread database is a KERNEL32 object (type K32OBJ_THREAD) that's
allocated from the KERNEL32 shared heap. Like process databases, the
thread databases aren't directly linked together in a linked-list fashion. The
THREADB.H file from the WIN32WLK sources has a C-style structure defi-
nition for a thread database.

...
3Ch PDWORD pTLSArray
This pointer points to the thread's TLS array. The entries in this array are
used by the TlsSetValue family of functions. TLS is described later in this
chapter. The actual memory for the TLS array comes a bit later in the
thread database.
...
98h DWORD TLSArray[64]
The TLSArray field is an array of 64 DWORDs. Each DWORD holds the
value that TLSGetValue returns for a given TLS ID. For instance, the first
DWORD in the array is returned by TLSGetValue(0). The second DWORD
is returned by TLSGetValue(1), and so on. TLS is described in a subsequent
section of this chapter.
...

”

原文有些晦涩，因为涉及了大量的实现细节，如Windows内核的实现和在内存中的存放。内容大约是bit flag数组的前32位和后32位分别存储在一个DWORD类型变量中，这两个数组存储在进程数据库（PDB）中。而PVOID型数据的基址和实际的数据则存储在线程数据库中。关于线程数据库和进程数据库以及Windows系统的其他细节，可以进一步阅读MATT PIETREK的大作，我这里就不班门弄斧了。。。

TLS访问实际数据主要通过PVOID数组中的DWORD类型的成员。这个成员存储的一般应该是线程私有变量的地址，PVOID应该是类似void指针的一种数据类型。

讲完了动态TLS的机制，剩下的就是操作系统提供给TLS的接口了。主要函数有以下四个：

DWORD TlsAlloc();

BOOL TlsSetValue( DWORD dwTlsIndex, PVOID pvTlsValue);

PVOID TlsGetValue(DWORD dwTlsIndex);

BOOL TlsFree(DWORD dwTlsIndex);

功能分别为获取一个Tls的索引，向slot数组中设置一个PVOID的指针，获取一个PVOID指针以及释放一个相应索引的Tls。函数接口并不难理解，在TlsAlloc中会将同进程中所有线程相应索引的PVOID数组全部设为0，其目的是为了防止访问到之前FREE调的脏数据。

关于索引的Tls存储位置，《Windows核心编程》描述如下：

“A DLL (or an application) usually saves the index（就指TLS索引） in a global variable. This is one of those times when a global variable is actually the better choice because the value is used on a perprocess basis rather than a per-thread basis.”

很清楚，作者推荐将Tls索引存储到进程的全局数据段中，这也是为何说Tls其实就是针对全局变量的多线程化的。

关于动态Tls机制，可以理解为操作系统为每一个线程提供了一个同步的内存空间，这些内存空间的结构（Tls的索引）相同，所指数据的含义（或用处）相同，但实际数据不同。由于索引是统一的，所以这个索引就存储为全局变量。

静态TLS

静态TLS的用法比较简单。只需要在全局或静态变量的声明前加入__declspec(thread)即可。

如：__declspec(thread) DWORD gt_dwStartTime = 0;

__declspec(thread)声明的局部变量（栈中生存）是没有意义的。

声明了__declspec(thread)的变量，会为每一个线程创建一个单独的拷贝，而对__declspec(thread)类型的变量的访问，编译器会做单独处理。

以上简略介绍了Windows操作系统中的TLS线程局部存储机制，主要参考了一些经典书籍。关于更详尽和更深入的细节，或者你想在程序中使用这些功能，还请参阅以上提到的参考书目。

参考书目：

MATT PIETREK 《Windows 95 System Programming Secrets》

Jeffrey Richter, Christophe Nasarre 《Windows via C/C++, Fifth Edition》

另：

关于TLS的一个应用就是MFC中的线程模块状态的管理。以下帖子是一个简要介绍MFC TLS的帖子：

原文：http://www.cnblogs.com/moonz-wu/archive/2008/05/08/1189021.html

线程局部存储TLS

    Windows操作系统提供了Process/Thread的程序模型，其中Process是资源的分配对象
，掌握了程序所拥有的资源，而Thread则代表了程序的运行，是操作系统调度的对象。需
要注意，操作系统中，这两种东西都是一种KERNEL32对象。分别由Process DataBase和Th
read DataBase来表示。具体可以参考Matt Petrik的Windows 95 Programing Secret

    Thread Local Storage是一个实现Thread的全局数据的机制，并且这些数据仅仅在这
个Thread中可见，因为这些数据保存在该Thread的Thread DataBase中：在每一个Thread
DataBase中都定义了一个64元的DWORD数组用来保存这些数据。同时操作系统也提供了相应
的函数来完成对这些数据的操作，如：TlsAlloc，TlsFree，TlsSetValue，TlsGetValue。

    在MFC中，也提供了TLS功能，为此MFC设计了一系列的类和程序来完成这个任务。具体
的程序在afxtls.cpp和afxtls_.h中。
涉及到的主要的类有：

   class CTypedSimpleList : public CSimpleList
   struct CThreadData : public CNoTrackObject
   struct CSlotData
   class CThreadSlotData
   class CThreadLocal : public CThreadLocalObject

    其中CThreadSlotData是封装TLS的最重要的类，CTypedSimpleList，CSlotData，CTh
readDAta都是为了封装TLS而设计的只具有辅助功能的类。CThreadLocal是更高层的封装。

    首先让我们来对其数据封装方式进行分析，重要的类的定义及其分析如下所示：(为简
单起见，只列出数据成员而不再列出函数成员)

定义：

   class CThreadSlotData
   {
       public:
       DWORD m_tlsIndex;
       int m_nAlloc;
       int m_nRover;
       int m_nMax;
       CSlotData* m_pSlotData;
       CTypedSimpleList<CThreadData*> m_list;
       CRITICAL_SECTION m_sect;
   };

分析：

    在afxtls.cpp中定义了一个CThreadSlotData类的全局变量：_afxThreadData。在CTh
readLocal的成员函数中大量使用了这个全局变量来访问TLS功能。

   DWORD m_tlsIndex

    用来保存TLS数据的索引，也就是在Thread DataBase中64元数组中的偏移量，这个数据在
CThreadSlotData类的构造函数中初始化。

   int m_nAlloc
   int m_nRover
   int m_nMax

    这三个变量用来分配slot和记录相关状态，比如m_nAlloc用来保存当前已经分配的slot的
个数。线程为每一个TLS数据分配一个slot。

   CSlotData* m_pSlotData;

    用来记录已经分配的每一个slot的状态：已经使用或是尚未使用。

   CTypedSimpleList<CThreadData*> m_list;

    CThreadSlotData为每一个Thread实现一个并且只实现一个CThreadData对象，并且用链表
类对象m_list来管理它们。实际上，真正被保存到Thread DataBase中去的是这个CThread
Data对象的指针，而程序员要保存的TLS数据被保存到这个CThreadData对象的pData成员指
向的动态数组中。所有Thread的CThreadData对象通过CThreadData对象的pNext成员连成链
表，并由CTypedSimpleList<CThreadData*> m_list管理。

   CRITICAL_SECTION m_sect;

    由于所有Thread的TLS操作都要靠访问_afxThreadData来实现，这样就产成了多线程同步的
问题，m_sect就是用来进行线程同步的变量。保证每次只有一个Thread在访问_afxThread
Data中的成员变量。

定义：

   struct CThreadData : public CNoTrackObject
   {
       CThreadData* pNext; // required to be member of CSimpleList
       int nCount;         // current size of pData
       LPVOID* pData;      // actual thread local data (indexed by nSlot)
   };

分析：

    CThreadData用来辅助CThreadSlotData来完成TLS功能。每一个Thread的TLS数据都要
靠一个CThreadData对象来管理和保存。

   CThreadData* pNext

    在CThreadSlotData中，CThreadData由一个链表来管理，pNext用来把各个Thread的CThre
adData对象连成链表。

   int nCount

    指出用于保存TLS数据指针的动态数组的长度。

   LPVOID* pData

    在CThreadData保存的实际上是各个TLS数据的指针，为此定义了一个指针数组，nCount用
来指示数组长度，pData用来指出数组的基地址。

定义：

   struct CSlotData
   {
       DWORD dwFlags;      // slot flags (allocated/not allocated)
       HINSTANCE hInst;    // module which owns this slot
   };

分析：

    CSlotData用来辅助CThreadSlotData来完成TLS功能。每一个Thread的TLS数据都要靠
一个CThreadData对象来保存，具体实现是把TLS数据的指针保存在CThreadData对象的动态
指针数组中(基地址由pData指出)。而这个数组中每一个成员的使用状况则由一个与之长度
相同的CSlotData数组来表示，具体由DWORD dwFlags来表明。

    从上面的分析不难发现，MFC中TLS功能的封装是这样的，所有Thread的TLS数据指针都
保存在一个动态的指针数组中，而该数组的基地址由一个CThreadData对象的 pData指出。
同时，保存在Thread DataBase中的是这个CThreadData对象的指针，而不是TLS数据的指针
，并且其索引值均相同，都为CThreadSlotData类中的m_tlsIndex成员。而且，在CThread
SlotData中提供了一个链表来管理所有Thread的CThreadData对象。这样CThreadSlotData
类就能访问所有的Thread的TLS数据。见图tls.bmp。(为了方便，我把图放到了签名档中了
，就在下面)

    下面来进一步说明如何使用TLS功能。

    为了方便TLS的使用，MFC设计了CThreadLocal类。它是一个模板类，具体的定义如下：

   template<class TYPE>
   class CThreadLocal : public CThreadLocalObject
   {
   // Attributes
   public:
       AFX_INLINE TYPE* GetData()
       {
           TYPE* pData = (TYPE*)CThreadLocalObject::GetData(&CreateObject);
           ASSERT(pData != NULL);
           return pData;
       }
       AFX_INLINE TYPE* GetDataNA()
       {
           TYPE* pData = (TYPE*)CThreadLocalObject::GetDataNA();
           return pData;
       }
       AFX_INLINE operator TYPE*()
       { return GetData(); }
       AFX_INLINE TYPE* operator->()
       { return GetData(); }

   // Implementation
   public:
       static CNoTrackObject* AFXAPI CreateObject()
       { return new TYPE; }
   };

    在使用CThreadLocal时，只要用CThreadLocal<ClassType> name;即可构造一个类型为
ClassType的TLS数据，注意ClassType必须以CNoTrackObject为基类。实际上上述声明定义
了一个名称为name的CThreadLocal对象，但是通过这个CThreadLocal对象，即可生成并访
问类型为ClassType的TLS数据。

关于MFC的模块状态管理，可以参阅李久进的《MFC深入浅出》第九章，MFC的状态，链接：http://www.vczx.com/tutorial/mfc/mfc9.php。

更深入的了解，可以阅读MFC源码。