Mixed DLL Loading analysis

这个问题还是关于ASP.NET. 问题背景:

 

客户的ASP.NET发生了hang, CPU0,客户端得不到任何反应.问题发生的时候抓取了hang dump

 

打开dump,一如既往地用!threads命令检查发现一共有20个活动的CLR thread,其中没有标记为GC的线程. 接下来用~* e !clrstack命令查看对应的CLR Callstack, 发现:

 

0:017> ~* e !clrstack

Thread 0

Not a managed thread.

Thread 1

Not a managed thread.

Thread 2

Not a managed thread.

Thread 3

Not a managed thread.

Thread 4

ESP         EIP      

Thread 5

ESP         EIP      

Thread 6

ESP         EIP      

Thread 7

Not a managed thread.

Thread 8

Not a managed thread.

Thread 9

Not a managed thread.

Thread 10

ESP         EIP      

0x01a9ec48  0x7c82ed54 [FRAME: GCFrame]

0x01a9f7e4  0x7c82ed54 [FRAME: ECallMethodFrame] [DEFAULT] Class System.Reflection.Assembly System.Reflection.Assembly.nLoad(Class System.Reflection.AssemblyName,String,Boolean,Class System.Security.Policy.Evidence,Boolean,Class System.Reflection.Assembly,ByRef ValueClass System.Threading.StackCrawlMark)

0x01a9f808  0x799afb84 [DEFAULT] Class System.Reflection.Assembly System.Reflection.Assembly.InternalLoad(Class System.Reflection.AssemblyName,Boolean,Class System.Security.Policy.Evidence,ByRef ValueClass System.Threading.StackCrawlMark)

0x01a9f830  0x799ea867 [DEFAULT] Class System.Reflection.Assembly System.Reflection.Assembly.Load(Class System.Reflection.AssemblyName)

0x01a9f838  0x0cdd2bef [DEFAULT] Class System.Collections.Hashtable System.Web.UI.Util.GetReferencedAssembliesHashtable(Class System.Reflection.Assembly)

0x01a9f84c  0x0c119870 [DEFAULT] [hasThis] Boolean System.Web.Compilation.PreservedAssemblyEntry.LoadDataFromFileInternal(Boolean)

0x01a9f89c  0x0c1194a2 [DEFAULT] [hasThis] Boolean System.Web.Compilation.PreservedAssemblyEntry.LoadDataFromFile(Boolean)

0x01a9f8c4  0x0c118a20 [DEFAULT] Class System.Web.Compilation.PreservedAssemblyEntry System.Web.Compilation.PreservedAssemblyEntry.GetPreservedAssemblyEntry(Class System.Web.HttpContext,String,Boolean)

0x01a9f8dc  0x0c118764 [DEFAULT] [hasThis] Class System.Web.UI.ParserCacheItem System.Web.UI.TemplateParser.GetParserCacheItemFromPreservedCompilation()

0x01a9f8f4  0x0c11740b [DEFAULT] [hasThis] Class System.Web.UI.ParserCacheItem System.Web.UI.TemplateParser.GetParserCacheItemInternal(Boolean)

0x01a9f91c  0x0c1186ae [DEFAULT] [hasThis] Class System.Web.UI.ParserCacheItem System.Web.UI.TemplateParser.GetParserCacheItemWithNewConfigPath()

0x01a9f94c  0x0c1172dc [DEFAULT] [hasThis] Class System.Web.UI.ParserCacheItem System.Web.UI.TemplateParser.GetParserCacheItem()

0x01a9f97c  0x0c115d84 [DEFAULT] Class System.Type System.Web.UI.ApplicationFileParser.GetCompiledApplicationType(String,Class System.Web.HttpContext,ByRef Class System.Web.UI.ApplicationFileParser)

0x01a9f994  0x0c115c4c [DEFAULT] [hasThis] Void System.Web.HttpApplicationFactory.CompileApplication(Class System.Web.HttpContext)

0x01a9f9a4  0x0c115b4f [DEFAULT] [hasThis] Void System.Web.HttpApplicationFactory.Init(Class System.Web.HttpContext)

0x01a9f9e4  0x0c115953 [DEFAULT] Class System.Web.IHttpHandler System.Web.HttpApplicationFactory.GetApplicationInstance(Class System.Web.HttpContext)

0x01a9fa10  0x0be76a97 [DEFAULT] [hasThis] Void System.Web.HttpRuntime.ProcessRequestInternal(Class System.Web.HttpWorkerRequest)

0x01a9fa4c  0x0be76690 [DEFAULT] Void System.Web.HttpRuntime.ProcessRequest(Class System.Web.HttpWorkerRequest)

0x01a9fa58  0x0be7320d [DEFAULT] [hasThis] I4 System.Web.Hosting.ISAPIRuntime.ProcessRequest(I,I4)

0x01a9fb20  0x79217188 [FRAME: ContextTransitionFrame]

0x01a9fc00  0x79217188 [FRAME: ComMethodFrame]

Thread 11

Not a managed thread.

Thread 12

Not a managed thread.

Thread 13

Not a managed thread.

Thread 14

ESP         EIP      

Thread 15

Not a managed thread.

Thread 16

ESP         EIP      

Thread 17

ESP         EIP      

0x0d1aeaf4  0x7c82ed54 [FRAME: GCFrame]

0x0d1af690  0x7c82ed54 [FRAME: ECallMethodFrame] [DEFAULT] Class System.Reflection.Assembly System.Reflection.Assembly.nLoad(Class System.Reflection.AssemblyName,String,Boolean,Class System.Security.Policy.Evidence,Boolean,Class System.Reflection.Assembly,ByRef ValueClass System.Threading.StackCrawlMark)

0x0d1af6b4  0x799afb84 [DEFAULT] Class System.Reflection.Assembly System.Reflection.Assembly.InternalLoad(Class System.Reflection.AssemblyName,Boolean,Class System.Security.Policy.Evidence,ByRef ValueClass System.Threading.StackCrawlMark)

0x0d1af6dc  0x799afa15 [DEFAULT] Class System.Reflection.Assembly System.Reflection.Assembly.InternalLoad(String,Class System.Security.Policy.Evidence,ByRef ValueClass System.Threading.StackCrawlMark)

0x0d1af6ec  0x799c2014 [DEFAULT] Class System.Reflection.Assembly System.Reflection.Assembly.Load(String)

0x0d1af6f4  0x0c11ab17 [DEFAULT] Class System.Collections.Hashtable System.Web.UI.CompilationConfiguration.LoadAssemblies(Class System.Collections.Hashtable)

0x0d1af758  0x0c11a8df [DEFAULT] [hasThis] Void System.Web.UI.TemplateParser.AppendConfigAssemblies()

0x0d1af784  0x0c119d01 [DEFAULT] [hasThis] Void System.Web.UI.TemplateParser.PrepareParse()

0x0d1af794  0x0c119b35 [DEFAULT] [hasThis] Void System.Web.UI.TemplateParser.Parse()

0x0d1af7c4  0x0c119a03 [DEFAULT] [hasThis] Class System.Web.UI.ParserCacheItem System.Web.UI.TemplateParser.GetParserCacheItemThroughCompilation()

0x0d1af7f4  0x0c117434 [DEFAULT] [hasThis] Class System.Web.UI.ParserCacheItem System.Web.UI.TemplateParser.GetParserCacheItemInternal(Boolean)

0x0d1af81c  0x0c1186ae [DEFAULT] [hasThis] Class System.Web.UI.ParserCacheItem System.Web.UI.TemplateParser.GetParserCacheItemWithNewConfigPath()

0x0d1af84c  0x0c1172dc [DEFAULT] [hasThis] Class System.Web.UI.ParserCacheItem System.Web.UI.TemplateParser.GetParserCacheItem()

0x0d1af87c  0x0c115d84 [DEFAULT] Class System.Type System.Web.UI.ApplicationFileParser.GetCompiledApplicationType(String,Class System.Web.HttpContext,ByRef Class System.Web.UI.ApplicationFileParser)

0x0d1af894  0x0c115c4c [DEFAULT] [hasThis] Void System.Web.HttpApplicationFactory.CompileApplication(Class System.Web.HttpContext)

0x0d1af8a4  0x0c115b4f [DEFAULT] [hasThis] Void System.Web.HttpApplicationFactory.Init(Class System.Web.HttpContext)

0x0d1af8e4  0x0c115953 [DEFAULT] Class System.Web.IHttpHandler System.Web.HttpApplicationFactory.GetApplicationInstance(Class System.Web.HttpContext)

0x0d1af910  0x0be76a97 [DEFAULT] [hasThis] Void System.Web.HttpRuntime.ProcessRequestInternal(Class System.Web.HttpWorkerRequest)

0x0d1af94c  0x0be76690 [DEFAULT] Void System.Web.HttpRuntime.ProcessRequest(Class System.Web.HttpWorkerRequest)

0x0d1af958  0x0be7320d [DEFAULT] [hasThis] I4 System.Web.Hosting.ISAPIRuntime.ProcessRequest(I,I4)

0x0d1afa20  0x79217188 [FRAME: ContextTransitionFrame]

0x0d1afb00  0x79217188 [FRAME: ComMethodFrame]

Thread 18

ESP         EIP      

0x0ec5ce80  0x7c82ed54 [FRAME: GCFrame]

0x0ec5d378  0x7c82ed54 [FRAME: GCFrame]

0x0ec5eac8  0x7c82ed54 [FRAME: GCFrame]

0x0ec5f664  0x7c82ed54 [FRAME: ECallMethodFrame] [DEFAULT] Class System.Reflection.Assembly System.Reflection.Assembly.nLoad(Class System.Reflection.AssemblyName,String,Boolean,Class System.Security.Policy.Evidence,Boolean,Class System.Reflection.Assembly,ByRef ValueClass System.Threading.StackCrawlMark)

0x0ec5f688  0x799afb84 [DEFAULT] Class System.Reflection.Assembly System.Reflection.Assembly.InternalLoad(Class System.Reflection.AssemblyName,Boolean,Class System.Security.Policy.Evidence,ByRef ValueClass System.Threading.StackCrawlMark)

0x0ec5f6b0  0x799ea867 [DEFAULT] Class System.Reflection.Assembly System.Reflection.Assembly.Load(Class System.Reflection.AssemblyName)

0x0ec5f6b8  0x0cdd2bef [DEFAULT] Class System.Collections.Hashtable System.Web.UI.Util.GetReferencedAssembliesHashtable(Class System.Reflection.Assembly)

0x0ec5f6cc  0x0c119870 [DEFAULT] [hasThis] Boolean System.Web.Compilation.PreservedAssemblyEntry.LoadDataFromFileInternal(Boolean)

0x0ec5f71c  0x0c1194a2 [DEFAULT] [hasThis] Boolean System.Web.Compilation.PreservedAssemblyEntry.LoadDataFromFile(Boolean)

0x0ec5f744  0x0c118a20 [DEFAULT] Class System.Web.Compilation.PreservedAssemblyEntry System.Web.Compilation.PreservedAssemblyEntry.GetPreservedAssemblyEntry(Class System.Web.HttpContext,String,Boolean)

0x0ec5f75c  0x0c118764 [DEFAULT] [hasThis] Class System.Web.UI.ParserCacheItem System.Web.UI.TemplateParser.GetParserCacheItemFromPreservedCompilation()

0x0ec5f774  0x0c11740b [DEFAULT] [hasThis] Class System.Web.UI.ParserCacheItem System.Web.UI.TemplateParser.GetParserCacheItemInternal(Boolean)

0x0ec5f79c  0x0c1186ae [DEFAULT] [hasThis] Class System.Web.UI.ParserCacheItem System.Web.UI.TemplateParser.GetParserCacheItemWithNewConfigPath()

0x0ec5f7cc  0x0c1172dc [DEFAULT] [hasThis] Class System.Web.UI.ParserCacheItem System.Web.UI.TemplateParser.GetParserCacheItem()

0x0ec5f7fc  0x0c115d84 [DEFAULT] Class System.Type System.Web.UI.ApplicationFileParser.GetCompiledApplicationType(String,Class System.Web.HttpContext,ByRef Class System.Web.UI.ApplicationFileParser)

0x0ec5f814  0x0c115c4c [DEFAULT] [hasThis] Void System.Web.HttpApplicationFactory.CompileApplication(Class System.Web.HttpContext)

0x0ec5f824  0x0c115b4f [DEFAULT] [hasThis] Void System.Web.HttpApplicationFactory.Init(Class System.Web.HttpContext)

0x0ec5f864  0x0c115953 [DEFAULT] Class System.Web.IHttpHandler System.Web.HttpApplicationFactory.GetApplicationInstance(Class System.Web.HttpContext)

0x0ec5f890  0x0be76a97 [DEFAULT] [hasThis] Void System.Web.HttpRuntime.ProcessRequestInternal(Class System.Web.HttpWorkerRequest)

0x0ec5f8cc  0x0be76690 [DEFAULT] Void System.Web.HttpRuntime.ProcessRequest(Class System.Web.HttpWorkerRequest)

0x0ec5f8d8  0x0be7320d [DEFAULT] [hasThis] I4 System.Web.Hosting.ISAPIRuntime.ProcessRequest(I,I4)

0x0ec5f9a0  0x79217188 [FRAME: ContextTransitionFrame]

0x0ec5fa80  0x79217188 [FRAME: ComMethodFrame]

Thread 19

ESP         EIP      

0x0ecaee18  0x7c82ed54 [FRAME: ECallMethodFrame] [DEFAULT] String System.Security.Util.Config._GetMachineDirectory()

0x0ecaee28  0x799a0a9f [DEFAULT] Void System.Security.Util.Config.GetFileLocales()

0x0ecaee34  0x799a07b7 [DEFAULT] [hasThis] Void System.Security.PolicyManager.InitData()

0x0ecaee50  0x799a072b [DEFAULT] Boolean System.Security.SecurityManager.InitPolicy()

0x0ecaee78  0x799a046b [DEFAULT] Class System.Security.PermissionSet System.Security.SecurityManager.ResolvePolicy(Class System.Security.Policy.Evidence,Class System.Security.PermissionSet,Class System.Security.PermissionSet,Class System.Security.PermissionSet,ByRef Class System.Security.PermissionSet,Boolean)

0x0ecaeeb8  0x799a02b1 [DEFAULT] Class System.Security.PermissionSet System.Security.SecurityManager.ResolvePolicy(Class System.Security.Policy.Evidence,Class System.Security.PermissionSet,Class System.Security.PermissionSet,Class System.Security.PermissionSet,ByRef Class System.Security.PermissionSet,ByRef I4,Boolean)

0x0ecaf180  0x791b3208 [FRAME: GCFrame]

0x0ecaf694  0x791b3208 [FRAME: ECallMethodFrame] [DEFAULT] [hasThis] Void System.AppDomain.SetupDomainSecurity(String,Class System.Security.Policy.Evidence,Class System.Security.Policy.Evidence,I)

0x0ecaf6b0  0x799cd45b [DEFAULT] Void System.AppDomain.InternalRemotelySetupRemoteDomainHelper(String,Class System.AppDomainSetup,I,SZArray Char,SZArray Char,SZArray UI1)

0x0ecaf6d8  0x799cd2c3 [DEFAULT] Void System.AppDomain.InternalRemotelySetupRemoteDomain(I4,I4,String,Class System.AppDomainSetup,I,SZArray Char,SZArray Char,SZArray UI1)

0x0ecaf6e4  0x799ccfda [FRAME: ContextTransitionFrame]

0x0ecaf738  0x799ccfda [DEFAULT] Void System.AppDomain.RemotelySetupRemoteDomain(Class System.AppDomain,String,Class System.AppDomainSetup,Class System.Security.Policy.Evidence,Class System.Security.Policy.Evidence,I)

0x0ecaf774  0x799c5a9a [DEFAULT] Class System.AppDomain System.AppDomain.CreateDomain(String,Class System.Security.Policy.Evidence,Class System.AppDomainSetup)

0x0ecaf790  0x0be70507 [DEFAULT] [hasThis] Object System.Web.Hosting.AppDomainFactory.Create(String,String,String,String,String,I4)

0x0ecaf96c  0x79217188 [FRAME: ComMethodFrame]

Thread 20

Not a managed thread.

Thread 21

Not a managed thread.

Thread 22

Thread 30

ESP         EIP      

Thread 31

Not a managed thread.

Thread 32

ESP         EIP      

Thread 33

ESP         EIP      

Thread 34

ESP         EIP      

Thread 35

Not a managed thread.

Thread 36

Not a managed thread.

Thread 37

Thread 50

ESP         EIP      

Thread 51

Not a managed thread.

Thread 64

Not a managed thread.

 

 

受到篇幅限制,中间省略了部分Not a managed thread和空callstack的结果.从上面的输出来看,虽然CLR thread20,但大多数都是idle.发生blocking,看得到callstackCLR thread只有四个,其中三个等在Assembly.Load上面.似乎在加载Assembly的时候出问题了.

 

在前面看到callstack停留在SqlCommand.Execute的时候,很容易理解对应的thread在等待SQL操作,大多数情况是在等待SQL服务器的执行结果,但是也看到等待GC的例子。

 

但这里很难理解什么直接的理由可以导致Assembly.Load发生堵塞.由于最终堵塞线程的必然是API调用,比如EnterCriticalSection, WaitForSingleObject,所以检查对应的unmanaged callstack是进一步弄清楚问题的下一步,切换到第一个活动的CLR 线程检查:

 

0:010> kb100

ChildEBP RetAddr  Args to Child             

01a9db1c 7c822124 7c83970f 00000154 00000000 ntdll!KiFastSystemCallRet

01a9db20 7c83970f 00000154 00000000 00000000 ntdll!NtWaitForSingleObject+0xc

01a9db5c 7c839620 00000000 00000004 00000001 ntdll!RtlpWaitOnCriticalSection+0x19c

01a9db7c 7c832ad0 7c889d94 00000000 00000000 ntdll!RtlEnterCriticalSection+0xa8

01a9dbb0 7c833ea9 00000001 00000000 01a9dbec ntdll!LdrLockLoaderLock+0xe4

01a9de20 77e41c90 1c68d7b8 01a9de6c 01a9de4c ntdll!LdrLoadDll+0xc9

01a9de88 791e0610 368ec3b8 00000000 00000009 kernel32!LoadLibraryExW+0x1b2

01a9dea4 791ea056 368ec3b8 00000000 00000009 mscorsvr!WszLoadLibraryEx+0x5f

01a9decc 791e7e58 368ec50c 36922588 00129000 mscorsvr!CorMap::BaseAddress+0x8b

01a9dee8 791e7ec4 36922378 368ec50c 00000000 mscorsvr!PEFile::Setup+0x45

01a9df04 791ea014 368ec50c 01a9e018 00000000 mscorsvr!PEFile::Create+0x38

01a9df1c 791ea117 368ec50c 368ec50c 01a9e018 mscorsvr!PEFile::CreateImageFile+0x3a

01a9df90 791ea151 00000001 00000000 3691f3b8 mscorsvr!PEFile::VerifyModule+0x76

01a9dfd4 791e88f6 01a9e248 00000000 26000000 mscorsvr!PEFile::Create+0x11a

01a9e01c 791e94d8 01a9e248 00000000 26000000 mscorsvr!SystemDomain::LoadFile+0x184

01a9e46c 791e8084 1d2312d8 3685d0f0 01a9f56c mscorsvr!AssemblySpec::GetAssemblyFromFusion+0x630

01a9e6f4 791e7f4b 00000000 01a9e7ac 00000001 mscorsvr!AssemblySpec::LowLevelLoadManifestFile+0x17d

01a9e714 791e2c95 01a9f53c 01a9e7a4 01a9e7ac mscorsvr!AppDomain::BindAssemblySpec+0x50

01a9ec24 791eb38b 01a9eca0 01a9eca8 01a9f7bc mscorsvr!AssemblySpec::LoadAssembly+0x98

01a9f7a4 01bab9ac 00000000 01a9f830 00000000 mscorsvr!AssemblyNative::Load+0x407

WARNING: Frame IP not in any known module. Following frames may be wrong.

01a9f820 799ea867 01a9f830 00000000 00000001 0x1bab9ac

01a9f824 01a9f830 00000000 00000001 0cdd2bef mscorlib_79990000+0x5a867

01a9f828 00000000 00000001 0cdd2bef 01a9f894 0x1a9f830

 

callstack上看到,CLR最后调用到LoadLibrary API上,在试图获取DLL Loader Lock的时候堵塞了。

 

DLL Loader LockOS维护的一个锁,用来确保系统加载DLL时候的顺序和同步。该锁是通过CriticalSection来实现的。在一个DLLDLLMain里面不允许调用LoadLibrary加载其它的DLL,也跟该锁相关。详细的信息可以搜索网上的资料。

 

这里最后堵塞在CriticalSection上,根据第二章的介绍,!cs命令可以派上用途了。看看谁占用了这个CriticalSection没有释放:

 

0:010> !cs 7c889d94

-----------------------------------------

Critical section   = 0x7c889d94 (ntdll!LdrpLoaderLock+0x0)

DebugInfo          = 0x7c889dc0

LOCKED

LockCount          = 0x12

WaiterWoken        = No

OwningThread       = 0x00000948

RecursionCount     = 0x2

LockSemaphore      = 0x154

SpinCount          = 0x00000000

0:010> ~~[0x00000948]

  18  Id: cc4.948 Suspend: 1 Teb: 7ffa3000 Unfrozen

      Start: mscorsvr!ThreadpoolMgr::intermediateThreadProc (791fda8b)

      Priority: 0  Priority class: 32  Affinity: 3

 

于是切换到线程18检查:

 

0:010> ~18s

eax=449db1b4 ebx=00000000 ecx=791b4ef3 edx=00000001 esi=793ef47c edi=00000588

eip=7c82ed54 esp=0ec5c384 ebp=0ec5c3c0 iopl=0         nv up ei pl zr na pe nc

cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246

ntdll!KiFastSystemCallRet:

7c82ed54 c3              ret

0:018> kb100

ChildEBP RetAddr  Args to Child             

0ec5c380 7c822124 7c83970f 00000588 00000000 ntdll!KiFastSystemCallRet

0ec5c384 7c83970f 00000588 00000000 00000000 ntdll!NtWaitForSingleObject+0xc

0ec5c3c0 7c839620 00000000 00000004 0ec5c65c ntdll!RtlpWaitOnCriticalSection+0x19c

0ec5c3e0 791b2f0c 793ef47c 0ec5c618 791e78ab ntdll!RtlEnterCriticalSection+0xa8

0ec5c3ec 791e78ab 793ef47c 791e7c89 0c067cb8 mscorsvr!EE_EnterCriticalSection+0xc

0ec5c3f4 791e7c89 0c067cb8 00000000 00000000 mscorsvr!CorMap::Enter+0x13

0ec5c618 791e7d32 0000009c 00000003 0ec5c65c mscorsvr!CorMap::OpenFile+0x73

0ec5c648 791e88f6 0ec5c8bc 00000000 26000000 mscorsvr!PEFile::Create+0xc3

0ec5c690 791e94d8 0ec5c8bc 00000000 26000000 mscorsvr!SystemDomain::LoadFile+0x184

0ec5cae0 791e8084 1d2130b8 368f3390 0ec5cf2c mscorsvr!AssemblySpec::GetAssemblyFromFusion+0x630

0ec5cd68 791e7f4b 00000000 0ec5cee0 00000001 mscorsvr!AssemblySpec::LowLevelLoadManifestFile+0x17d

0ec5cd88 79208071 0ec5cefc 0ec5cef8 0ec5cee0 mscorsvr!AppDomain::BindAssemblySpec+0x50

0ec5d19c 79207d79 00000003 00000024 00000000 mscorsvr!Assembly::ComputeBindingDependenciesClosure+0x25a

0ec5d1c4 791e31ca 369291b8 369210e0 0ec5d324 mscorsvr!BaseDomain::CreateShareableAssemblyNoLock+0x7e

0ec5d340 79258d85 36928d80 369210e0 0ec5d3cc mscorsvr!BaseDomain::LoadAssembly+0xacd

0ec5d5e0 79019ad7 00000000 00000001 00000000 mscorsvr!ExecuteDLL+0x281

0ec5d5fc 7901a789 44940000 00000001 00000000 mscoree!CorDllMainWorker+0x6c

0ec5d638 7c82257a 44940000 00000000 00000000 mscoree!_CorDllMain+0x106

0ec5d658 7c8358fb 449594e2 44940000 00000001 ntdll!LdrpCallInitRoutine+0x14

0ec5d760 7c835bcb 00000000 00000000 00000000 ntdll!LdrpRunInitializeRoutines+0x367

0ec5d9f4 7c833ee5 00000000 1c626830 0ec5dcbc ntdll!LdrpLoadDll+0x3cd

0ec5dc70 77e41c90 1c626830 0ec5dcbc 0ec5dc9c ntdll!LdrLoadDll+0x198

0ec5dcd8 791e0610 36821d50 00000000 00000008 kernel32!LoadLibraryExW+0x1b2

0ec5dcf4 791ea056 36821d50 00000000 00000008 mscorsvr!WszLoadLibraryEx+0x5f

0ec5dd1c 791e7e58 36821ed4 368f7740 368f7530 mscorsvr!CorMap::BaseAddress+0x8b

0ec5dd38 7922c8ba 368f7530 36821ed4 00000000 mscorsvr!PEFile::Setup+0x45

0ec5ddac 791ea151 00000000 36922378 369210e0 mscorsvr!PEFile::VerifyModule+0x195

0ec5ddf0 791e88f6 0ec5e0c8 00000000 26000000 mscorsvr!PEFile::Create+0x11a

0ec5de38 791e94d8 0ec5e0c8 00000000 26000000 mscorsvr!SystemDomain::LoadFile+0x184

0ec5e2ec 791e8084 1d2130b8 368cc008 0ec5f3ec mscorsvr!AssemblySpec::GetAssemblyFromFusion+0x630

0ec5e574 791e7f4b 00000000 0ec5e62c 00000001 mscorsvr!AssemblySpec::LowLevelLoadManifestFile+0x17d

0ec5e594 791e2c95 0ec5f3bc 0ec5e624 0ec5e62c mscorsvr!AppDomain::BindAssemblySpec+0x50

0ec5eaa4 791eb38b 0ec5eb20 0ec5eb28 0ec5f63c mscorsvr!AssemblySpec::LoadAssembly+0x98

0ec5f624 01bab9ac 00000000 0ec5f6b0 00000000 mscorsvr!AssemblyNative::Load+0x407

WARNING: Frame IP not in any known module. Following frames may be wrong.

0ec5f6a0 799ea867 0ec5f6b0 00000000 00000001 0x1bab9ac

0ec5f6a4 0ec5f6b0 00000000 00000001 0cdd2bef mscorlib_79990000+0x5a867

0ec5f6a8 00000000 00000001 0cdd2bef 0ec5f714 0xec5f6b0

 

这个callstack好长。理清楚逻辑,Assembly.Load调用被分发到CLR Runtime,CLR Runtime调用API LoadLibaray加载DLL,在加载结束以前,系统会用DLL_PROCESS_ATTACH作为dwReason调用当前进程里面所有的DLLEntry Point (DLLMain). 所以mscoree!_CorDllMain被调用。Mscoree!_CorDllMain再次调入CLR Runtime做一些DLL/Assembly相关的操作,操作的过程中再次等待到某一个CriticalSectoin

 

那再次检查对应的CriticalSection被谁占用:

 

0:018> !cs 793ef47c

-----------------------------------------

Critical section   = 0x793ef47c (mscorsvr!CorMap::m_pCorMapCrst+0x0)

DebugInfo          = 0x000ddfd0

LOCKED

LockCount          = 0x1

WaiterWoken        = No

OwningThread       = 0x00000c6c

RecursionCount     = 0x1

LockSemaphore      = 0x588

SpinCount          = 0x00000000

0:018> ~~[0x00000c6c]

  17  Id: cc4.c6c Suspend: 1 Teb: 7ffa4000 Unfrozen

      Start: mscorsvr!ThreadpoolMgr::intermediateThreadProc (791fda8b)

      Priority: 0  Priority class: 32  Affinity: 3

 

Thread 17?! 原来thread 10在等thread 17, thread 17thread 18发生deadlock在互相等待。下一步需要弄清的就是thread 17thread 18CriticalSection到底有什么依赖关系。

 

前面已经看到thread 17在等待loader lock. thread 18callstack上,LoadLibrary的调用还没有退出,所以loader lock没有释放,thread 17等待thread 18的原因很清楚。那thread 18为何需要等待thread 17呢?检查thread 18EnterCriticalSectioncaller, mscorsvr!CorMap::Enter。从名字上看,该CriticalSection应该是用来同步CorMap资源的,而CorMap应该是CLR的一个内部实现类。回头看thread 17, thread 17中的LoadLibrarycallermscorsvr!CorMap::BaseAddress!原来CorMap的相关函数在thread 17中的调用也没有结束,所以thread 18应该是在等待thread 17没有释放的CorMap

 

原因看明白了,解决方法呢?整理一下思路,这里无论是managed callstack还是unmanaged callstack,所有的参与函数都是CLR RuntimeOS API,中间没有涉及到任何客户自己的函数。说明问题在很大程度上跟客户的具体应用没有关系,很有可能是CLR或者OSbug

 

关于CLRloader lock的冲突,最臭名昭著的就是前面介绍过的mixed DLL loading:

 

Mixed DLL Loading Problem

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dv_vstechart/html/vcconMixedDLLLoadingProblem.asp

 

为了看看是不是足够幸运地遇上这个问题,接下来尝试去找到引发问题的DLL的名字,看看是不是用managed C++写的。检查LoadLibaray 的第一个参数可以看到:

 

0:018> du 36821d50

36821d50  "c:\windows\microsoft.net\framewo"

36821d90  "rk\v1.1.4322\temporary asp.net f"

36821dd0  "iles\root\c01b9164\6d04ba4\assem"

36821e10  "bly\dl2\ce86c938\007079bd_9d34c6"

36821e50  "01\Company.product.usage"

36821e90  "name.Class1.dll"

0:018> du 368ec3b8

368ec3b8  "c:\windows\microsoft.net\framewo"

368ec3f8  "rk\v1.1.4322\temporary asp.net f"

368ec438  "iles\root\6882f6e1\6ac01cc1\asse"

368ec478  "mbly\dl2\a3ad3432\5b25229b_ffa0c"

368ec4b8  "601\Company.Class2.dll"

 

上面的输出提供了两个信息:

 

1.      DLL的具体名字

2.      DLL的路径可以看到,该DLL是从ASP.NET临时文件架加载的。所以该DLL应该是被部署到了ASP.NETbin目录下

 

dump中用lmf命令找出这两个DLL,用reflector检查后发现下面的信息:

 

1.      这两个AssemblyStrongName

2.      这两个Assembly是用managed C++开发的,因为里面包含了下面的类型信息,这些类型信息是MC++编译器生成的

 

[StructLayout(LayoutKind.Explicit, Size=520, Pack=1), DebugInfoInPDB]
internal struct $ArrayType$0x47914e9e
{

}

 

所以,这里不幸遭遇了mixed dll loading的问题。在CLR 1上面,该问题是没有完美解决方案的。所以最显而易见的方法是升级到CLR 2.0. 但是,为了挑战一下自己,看看能不能在现有的基础上尽量做点什么。

 

首先,从callstack上看到,问题根源在于两个thread互相等待。其实,只要DLL Loader早一点释放,或者CorMap lock早一点释放,问题都不会发生。也就是说,问题并非想重现就可以找到恰当的时机重现的。客户的程序肯定是执行了相同的code path很多次,终于有一次遇上了死锁的条件。换句话说,要避免这个问题,只需要尽可能地减少code path执行的频率就可以了。导致问题的code path用来加载DLL,为何需要频繁地加载Assembly呢?由于Assembly不能单独卸载,必须跟appDomain一起卸载,所以接下来可以用!dumpdomain看看appDomainAssenbly的情况:

 

0:018> !dumpdomain -stat

    Domain      Num Assemblies      Size Assemblies     Name

0x793f16a0                  1         2,138,112      System Domain

0x793f2b70                328       288,662,016      Shared Domain

0x000cf478                  2         2,490,368      DefaultDomain

0x001334b0                 77        20,086,784      /LM/W3SVC/262227220/Root-1-127973823709142110

0x1d2312d8                 27        16,484,864      /LM/W3SVC/1064537246/Root-36-127973891453478142

0x368594b8                 11         7,123,456      /LM/W3SVC/182767242/Root-37-127973891487384175

0x36915188                  0                 0 Domain39

Total 41 Domains

 

哇,这么多appDomain。再去掉-stat参数看看每个appDomain记载的信息,发现上面两个Assembly在超过30appDomain中都有加载。由此看来,这两个MC++ Assembly中包含的应该是一些公用的组件,由于在多个应用中都会用到,所以分别部署到了各个Virtual Directorybin目录下。有超过30Virtual Directory被分配到了同一个app pool, 每一个Virtual Directory对应一个appDomain, 同一个w3wp.exe进程中频繁的appDomian加载和卸载触发了这个问题。所以可以采取下面的方法来尽可能地避免问题的发生:

 

1.      检查是否有防毒程度频繁扫描web.config文件导致appDomain的重启。尽量避免对Virtual Directory的修改来避免appDomain的重启

2.      把多个Virtual Directory合并,以便共同引用同一套MC++ Assembly,减少MC++ Assembly的加载

3.      拆分Virtual Directory到多个不同的app pool.由于Loader lockCorMap lock使用CriticalSection,是进程级别的锁,所以分割到多个进程可以减少触发问题的频率

4.      既然这两个AssemblyStrong Named的,应该放到GAC中,而不是bin目录下,详细参考:
ASP.NET: Strong named assemblies should not be stored in the bin directory
http://blogs.msdn.com/tess/archive/2006/04/13/575361.aspx

原文地址:https://www.cnblogs.com/lixiong/p/510198.html