21Why httpclient is recommended to go with a connection pool in server-to-server request?

Why httpclient is recommended to go with a connection pool in server-to-server request?

Besides performance improvement with absence of connection and disconnection in http short connections, there are 2 points that I think deserve attention:

1 Http short connections contribute to big amounts of TIME_WAIT tcp connections who will hold many native ports

As known as usual, an http client will establish a tcp connection with the server transferring requests and responses, during which, the connection will also hold a native port in client side. When a single http client closes a connection after receiving the response, a tcp FIN pack is sent to the server, and then the procedure would go in this way:

As shown in the pic, the connection will always stay until 2MSL (usually 1 min for 1MSL by default in linux) with the native port held even though the proceeding is down. As a result, there would be huge amounts of TIME_WAIT tcp connections with native ports in addition unavailable in client side within 2MSL under high-frequency, which can be found out with command 'netstat -anp|grep [server port].

For instance, 10,000 connections are established in one minute, which means 10,000 native ports will be unavailable in the following 2 minutes for the most part. Unfortunately, that will make influence on not only this proceeding nut also all other proceedings on the computer with new http requests in this period leaded to failure out of native ports which linux has 65535(2^16-1) at most.

To sum up, it is necessary to use a connection pool with http long connections to keep the occupation of native ports and tcp resources such as socket buffer under control.

50000/2*60 = 417 qps，意味着当qps大于这个数时，本地端口就没了

2 Each http connection will hold a file

By default, a proceeding can only hold 1024 files at the same time, found out by command 'ulimit -n' in linux. This restriction will possibly be touched under high-frequency http requests. What makes the matter worse is that meanwhile jobs may not read any local file due to the restriction, pushed to failure caused by explosive http short connections of great quantity.

(btw, good news is that in practice a tcp connection seems no longer to occupy a file in client side after FIN_WAIT2, but it still occupies a port until 2MSL after TIME_WAIT status)

This situation can be avoided by a connection pool with fixed connect number as well as fixed number of opening files, which would contribute to keep file occupation of http connections under control to a large extent.

That's what I take into close consideration with reference to the issue.

These rules are applied for not only httpclient, but also other tcp clients such as java socket, jdbc, redis client and so on.

I would appreciate your further concern if something missed, please feel free to keep them left in the comments.

reference:

https://cwiki.apache.org/confluence/display/HttpComponents/FrequentlyAskedConnectionManagementQuestions

subs：

1 21-ahttpclient 与TIME_WAIT

2 21-b并发tcp连接数与文件描述符 22文件描述符耗尽（一）mac【本地】 22文件描述符耗尽（二）linux【本地】

3 21-chttp连接池该取多大

ps:对于所有基于tcp协议的，比如redis mysql http，都提倡用池，除了免去建立连接的性能消耗，另一点就在于此

补充：

23文件描述符耗尽（二）linux【本地】中证明fin_wait2状态不占用文件描述符，所以不用池关闭的连接会占用端口，但不会占用fd