第十六章线程栈

//1.
(A):有时候系统会在用户模式分区中预定区域，比如系统分配进程环境块和分配线程环境块，另一种可能是分配线程栈
(B):系统创建线程时，会为线程栈预定一块地址空间(每个线程都有自己的栈)，并给地址空间调拨一些物理存储器
(C):默认情况下，系统会预定1MB的地址空间并调拨两个页面的物理存储器
(D):vs2010:项目->属性->链接器->系统->堆栈保留大小 可以设置线程栈的大小，单位为字节。
(E):CreateThread 第二个参数:
To change the initially committed stack space, use the dwStackSize parameter of the CreateThread, CreateRemoteThread, or CreateFiber function.
This value is rounded up to the nearest page. Generally, the reserve size is the default reserve size specified in the executable header.
However, if the initially committed size specified by dwStackSize is larger than or equal to the default reserve size,
the reserve size is this new commit size rounded up to the nearest multiple of 1 MB.
To change the reserved stack size, set the dwCreationFlags parameter of CreateThread or CreateRemoteThread to STACK_SIZE_PARAM_IS_A_RESERVATION and use the dwStackSize parameter. 
In this case, the initially committed size is the default size specified in the executable header. For fibers, use the dwStackReserveSize parameter of CreateFiberEx. 
The committed size is specified in the dwStackCommitSize parameter.

//例子:CreateThread 第二个参数和STACK_SIZE_PARAM_IS_A_RESERVATION标志对线程栈大小的影响
#include <Windows.h>

DWORD __stdcall FunThread(void* pVoid)
{
	char buff[8 * 1024 * 1024] = {};
	for (int i = 0; i < sizeof buff; ++i)
	{
		buff[i] = i;
	}
	return 0;
}

int main()
{
	HANDLE hThread = CreateThread(nullptr, 10 * 1024 * 1024, FunThread, nullptr, STACK_SIZE_PARAM_IS_A_RESERVATION, nullptr);
	WaitForSingleObject(hThread, INFINITE);
	CloseHandle(hThread);
}

//2.
接下来讨论的是线程栈的默认情况(不修改线程栈大小，CreateThread 或 _beginthreadex 的第二个参数传0)
(A):下图显示了在一台页面大小4KB的机器上，线程栈的地址空间区域(基地址:0x08000000)。该线程栈的地址空间和所有调拨给该区域的物理存储器都具有 PAGE_READWRITE 属性
在预订了地址空间后，系统会给区域顶部(即地址最高)的两个页面调拨物理存储器，在线程开始执行前，系统会把该线程的线程栈指针指向区域顶部的那个页面的末尾(非常接近0x80100000).
上述地址最高的那个页面就是线程开始使用栈的地方，区域顶部向下的第二个页面被称为防护页面，随着线程调用，线程将需要越来越多的栈空间

(B):当线程试图访问防护页面的内存时，系统会得到通知，这时系统会先给防护页面下面的页面调拨物理存储器并指定 PAGE_GUARD 属性，接着去除当前防护页面的 PAGE_GUARD 属性
上述技术使得系统能够在线程需要的时候才增加栈存储器大小

(C):随着线程的调用，当线程需要给地址 0x08001000 的页面调拨物理存储器时，系统做法会和给其他页面调拨物理存储器时产生区别，
和之前一样，系统去除地址 0x8002000 的 PAGE_GUARD 属性，然后给 0x08001000 调拨物理存储器，但此时，系统不会为 0x08001000 指定 PAGE_GUARD 属性。
此时意味着栈的地址空间区域已经放满了他所能容纳下的所有物理存储器。
系统永远不会给区域底部的那个页面调拨物理存储器，因为当栈的增长超过了所预定的区域时，线程栈会覆盖进程地址空间中的其他数据，这是不能接受的
当系统用尽了栈空间并试图访问尚未调拨物理存储器的 0x08000000 页面时，就会引发 EXCEPTION_STACK_OVERFLOW 异常，接着会弹出StackOverflow错误框并终止进程

(D):当线程执行如下代码: char buff[4096]; 除非程序试图访问其中的数据，否则系统不会给这块区域调拨物理存储器

//3.
(A):由于开启线程时，系统需要在进程用户模式分区中开辟一块区域来分配线程栈，所以当进程用户模式分区所剩无几时，会影响到当前可开启的线程数目

//4.
(A):
//将栈保留大小设为100MB    栈的分配与 VirtualAlloc 的性能比较
#include <Windows.h>
#include <assert.h>

DWORD __stdcall FunThread(void* pVoid)
{
	LARGE_INTEGER aLarge[2] = {};
	long long aSum[2] = {};

	for (int i = 0; i < 100; ++i)
	{
		QueryPerformanceCounter(aLarge);
		char buff[1024 * 1024 * 8] = {};
		buff[0] = 11;
		QueryPerformanceCounter(aLarge + 1);
		aSum[0] += aLarge[1].QuadPart - aLarge[0].QuadPart;
	}

	for (int i = 0; i < 100; ++i)
	{
		QueryPerformanceCounter(aLarge);
		char* pMemory = reinterpret_cast<char*>(VirtualAlloc(nullptr, 1024 * 1024 * 8, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE));
		pMemory[0] = 11;
		QueryPerformanceCounter(aLarge + 1);
		aSum[1] += aLarge[1].QuadPart - aLarge[0].QuadPart;
	}

	//aSum[0] = 306551
	//aSum[1] = 720
	return 0;
}

int main()
{
	HANDLE hThread = CreateThread(nullptr, 1024 * 1024, FunThread, nullptr, 0, nullptr);
	WaitForSingleObject(hThread, INFINITE);

	char buff[1024 * 1024 * 80] = {};			//不会出错
	for (int i = 0; i < 1024 * 1024 * 80; ++i)
	{
		buff[i] = i;
	}
	printf("%d", buff[1024 * 1024 * 80 - 10]);	//输出-10
}

(B):C/C++的运行库有一个栈检查函数，在编译源代码时，编译器会在必要时生成代码来调用此函数，目的是为了确保已经给线程栈调拨了物理存储器
(C):Microsoft C/C++编译器提供了一些编译器开关，帮助我们检查运行时栈有没有遭到破坏。创建C++项目时，默认的 DEBUG 配置的 /RTCsu 开关是打开的，如果一个局部数组变量在运行过程中发生了写越界，
那么编译器插入的代码就会发现这种情况，并在函数返回的时候通知应用程序，只有 DEBUG 配置才能使用 /RTC 开关
对于 RELEASE 配置来说，我们应该使用 /Gs 开关，这个开关告诉程序在调用任何函数前，插入一些代码，把栈的当前信息作为 cookie 保存起来，然后在函数返回后检查栈的完整性，当发生栈溢出的时候，
代码在保存的 cookie 检查栈信息的状态时就能检测到并终止应用程序
(D):
#include <Windows.h>

DWORD __stdcall FunThread(void* pVoid)
{
	return 0;
}

int main()
{
	int nCount = 0;

	while(true)
	{
		if (CreateThread(nullptr, 0, FunThread, nullptr, CREATE_SUSPENDED, nullptr))
		{
			++nCount;
		}
		else
		{
			break;
		}
	}
	//nCount = 1456
}
32位程序运行在32位系统下，则上述代码中的 nCount 在2000左右，这是因为受到32位程序的虚拟地址空间大小导致的
32位程序运行在64位系统下，则上述代码中的 nCount 在1450左右，原因如下:
WOW64 enables 32-bit applications to take advantage of the 64-bit kernel. Therefore, 32-bit applications can use a larger number of kernel handles and window handles. 
However, 32-bit applications may not be able to create as many threads under WOW64 as they can when running natively on x86-based systems because WOW64 allocates an 
additional 64-bit stack (usually 512 KB) for each thread. In addition, some amount of address space is reserved for WOW64 itself and the data structures it uses. 
The amount reserved depends on the processor; more is reserved on the Intel Itanium than on the x64 or ARM64 processors.
https://msdn.microsoft.com/en-gb/library/windows/desktop/aa384219(v=vs.85).aspx

第十六章 线程栈

第十六章线程栈