.Net线程池ThreadPool导致内存高的问题分析

最近写了一个WinFrom程序。此程序侦听TCP端口，接受消息处理，然后再把处理后的消息，利用线程池通过WebService发送出去（即一进一出）。

在程序编写完成后，进行压力测试。用Fiddler提交1万请求。

                ThreadPool.QueueUserWorkItem((o) =>
                {
                    try
                    {
                        APPService.AddLog(o as MDMPortalWCF.LogInfo);//发送此Log时，是提交WebService请求的。
                    }
                    catch (Exception e)
                    {
                        Console.WriteLine(e.ToString());                        
                    }

                },log);

　使用procexp.exe查看，随着TCP请求的增多，2代GCHeap不断增大。是明显的内存泄漏线索。

使用WinDbg分析，GC2明显大，线程正在忙。发现很多String（LogInfo中的属性）被System.Threading.ThreadPoolWorkQueue.QueueSegment引用。正是内存高的原因。

使用Fiddler查看，发现出乎意料的行为，1万个TCP接受完成后，Log却没有发送完。而是持续之后很久，一直发送，直到几分钟后才发送完成。

此行为表明，ThreadPool倾向于少数线程排队任务，不倾向于开很多线程迅速完成任务。

当发送日志的任务在ThreadPool的队列排队时，LogInfo不会被GC回收，而是被QueueSegment持有。待ThreadPool执行完成后，QueueSegment不再引用任务对象，故内存被回收。

后来想想，ThreadPool.QueueUserWorkItem(WaitCallback callBack, object state)中的state对象，在被执行前，将一直被引用，这是合情合理的。但由于ThreadPool的排队性质，导致内存释放也是缓慢的，往往是人们想不到的。在此记录，以示后人。

2018-03-23

在网上读到一篇文章（http://www.albahari.com/threading/#_Optimizing_the_Thread_Pool）说的非常详细。更好的解释了如上的现象。SetMinThreads是一种高级优化线程池的技术。线程池策略是经济型的，即尽量节省线程新建，这是为了防止多个短生命期任务造成内存突然膨胀。但当入队的任务，超过半秒执行时间，线程池才开始每半秒新增一个Thread。当然，这也造成了一些问题，比如多个访问互联网的线程，我们希望同时一起访问。这时需要设置ThreadPool.SetMinThreads（100，100）,这条语句，告诉线程池管理器，在前100个线程内，不要等待半秒，任务来了，立即创建线程。原文如下：

How Does the Minimum Thread Count Work?

Increasing the thread pool’s minimum thread count to x doesn’t actually force x threads to be created right away — threads are created only on demand. Rather, it instructs the pool manager to create up to x threads the instant they are required. The question, then, is why would the thread pool otherwise delay in creating a thread when it’s needed?

The answer is to prevent a brief burst of short-lived activity from causing a full allocation of threads, suddenly swelling an application’s memory footprint. To illustrate, consider a quad-core computer running a client application that enqueues 40 tasks at once. If each task performs a 10 ms calculation, the whole thing will be over in 100 ms, assuming the work is divided among the four cores. Ideally, we’d want the 40 tasks to run on exactly four threads:

Any less and we’d not be making maximum use of all four cores.
Any more and we’d be wasting memory and CPU time creating unnecessary threads.

And this is exactly how the thread pool works. Matching the thread count to the core count allows a program to retain a small memory footprint without hurting performance — as long as the threads are efficiently used (which in this case they are).

But now suppose that instead of working for 10 ms, each task queries the Internet, waiting half a second for a response while the local CPU is idle. The pool manager’s thread-economy strategy breaks down; it would now do better to create more threads, so all the Internet queries could happen simultaneously.

Fortunately, the pool manager has a backup plan. If its queue remains stationary for more than half a second, it responds by creating more threads — one every half-second — up to the capacity of the thread pool.

The half-second delay is a two-edged sword. On the one hand, it means that a one-off burst of brief activity doesn’t make a program suddenly consume an extra unnecessary 40 MB (or more) of memory. On the other hand, it can needlessly delay things when a pooled thread blocks, such as when querying a database or calling WebClient.DownloadFile. For this reason, you can tell the pool manager not to delay in the allocation of the first x threads, by calling SetMinThreads, for instance:

ThreadPool.SetMinThreads (50, 50);

(The second value indicates how many threads to assign to I/O completion ports, which are used by the APM, described in Chapter 23 of C# 4.0 in a Nutshell.)

The default value is one thread per core.