I/O exception (java.net.SocketException) caught when processing request: Connect

Exception

【一个故障引发的话题】

最近，项目中的短信模块收到一个故障日志，要求我协助调查一下：

2010-05-07 09:22:07,221 [?:?] INFO httpclient.HttpMethodDirector - Retrying request

:org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(Unknown Source)

2010-05-07 09:22:07,223 [?:?] INFO httpclient.HttpMethodDirector - I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server sms failed to respond

:org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(Unknown Source)

查阅了HttpClient官方的异常说明文档（http://hc.apache.org/httpclient-3.x/exception-handling.html），可以看到以下一段话：

In some circumstances, usually when under heavy load, the web server may be able to receive requests but unable to process them. A lack of sufficient resources like worker threads is a good example. This may cause the server to drop the connection to the client without giving any response. HttpClient throws NoHttpResponseException when it encounters such a condition. In most cases it is safe to retry a method that failed with NoHttpResponseException.

在某些情况下，通常在重负载下时，Web服务器可能能够接收请求，但无法处理它们。缺乏足够的资源，比如工作线程，这可能会导致服务器断开连接的客户端没有给予任何回应。当它遇到这样的条件HttpClient会抛出NoHttpResponseException。此异常是由于服务器端过载而拒绝接受请求（不再响应）所致。

总述：实现一个HTTP接口不是件困难的事情，但是如何让这样的HTTP接口在高压力下（短时间内大数据量）也有稳定良好的表现，则不仅仅是HTTP服务器端需要做好设计与优化，而且HTTP客户端方面也同样需要非常谨慎与注意一些代码细节。否则，很有可能因（双方或单方）代码或配置中存在性能隐患，在软硬件环境的配合下就会出现一些“灵异”故障。

【HTTP协议知识

为便于读者理解后文，先简述一些与HTTP性能密切相关的、又常常被工程师们所不深究的HTTP协议基础知识。

一，什么是HTTP KeepAliv 电子邮件

HTTP KeepAlive是就是通常所称的长连接。KeepAlive即服务器端为同一客户端保持连接一段时间（不立即关闭），以便于更多来自于此客户端的后续请求不断的利用此连接直至连接超时。

在HTTP1.0和HTTP1.1协议中都有对的KeepAlive的支持。其中HTTP1.0需要在请求头中增加“连接：保持活动”才能够支持，而HTTP1.1默认支持。

该属性的更多阐述：

1，下一个请求是在完成之前请求的响应被客户端接收的情况下才发出。因此需要在向客户端写完之前的请求的响应后才能触发。

2、HTTP协议是基于TCP协议的，故服务器端与客户端都有可能关闭连接。KeepAlive只是表明了服务器端面对连接的一种优化策略，而客户端也完全可以主动关闭之（不利用）。

二，KEEPALIVE的好处与坏处

KeepAlive带来的好处是可以减少HTTP连接的开销，提高性能。比如，同一页面中如有很多内嵌的图片、JS、CSS等请求，则可以利用此特新性，使用少量的连接数（IE下一般是2个）更快的下载下来，使得网页更快的展示出来。

QeepAlive的坏处是：

如果有大量不同的客户端同时（或瞬间）请求服务器端，且每一个客户端的都长期占用连接（比如：不关闭且ConnectionTimeOut设置过长）或服务器端也不快速失效连接（KeepAliveTimeout参数设置过大）的话，可能会快速占满服务器连接资源，导致更多的请求被排队或被拒绝或服务器down掉。

总结：浏览器作为一种HTTP客户端，充分的、很好的利用了HTTP协议的KeepAlive，让我们的浏览更加快速；而我们自写的HTTP客户端程序在KeepAlive特性（服务器已开启）下，需要以高数据量访问一个HTTP接口的时候，每一次请求应当尽快关闭连接释放资源（重点推荐）或者在同一连接上适当多发几次请求（不推荐）。

老外有一篇文章，很好的描述了类似代码的性能隐患：《HttpClient容易忽视的细节——连接关闭》

1、英文原文：http://www.codeweblog.com/httpclient-s-easy-to-overlook-the-details-the-connection-is-closed/

老外有一篇文章，很好的描述了类似代码的性能隐患：《HttpClient容易忽视的细节——连接关闭》

1、英文原文：http://www.codeweblog.com/httpclient-s-easy-to-overlook-the-detail

【英文原文】《HttpClient容易忽视的细节——连接关闭》
{URL:http://www.codeweblog.com/httpclient-s-easy-to-overlook-the-details-the-connection-is-closed/}

HttpClient client = new HttpClient();
HttpMethod method = new GetMethod("http://www.apache.org");
try {
  client.executeMethod(method);
  byte[] responseBody = null;   
responseBody = method.getResponseBody();
} catch (HttpException e) {

// TODO Auto-generated catch block

e.printStackTrace();

} catch (IOException e) {

// TODO Auto-generated catch block

e.printStackTrace();

}finally{

method.releaseConnection();
}

Most people use the HttpClient examples above are using similar code, including examples of the official Apache as well. I have recently found that the use of HttpClient is a loop to send a large number of requests to the server APACHE server, the link will lead to being filled, the follow-up request is queued.
I APACHE server-side configuration of

Timeout 30
KeepAlive On   # Indicates that the server-side will not take the initiative to close links
MaxKeepAliveRequests 100
KeepAliveTimeout 180

Therefore, such a configuration would lead to at least each link will only be released to lead a 180S, so that when a large number of requests to visit is bound to cause the link is filled, the request to wait for the situation.
Through DEBUH found HttpClient in method.releaseConnection () after the link did not shut down, this method is only a link back to the connection manager. If you use HttpClient client = new HttpClient () instantiates a HttpClient connection manager default implementation is to use SimpleHttpConnectionManager. SimpleHttpConnectionManager have a constructor as follows

/**
 * The connection manager created with this constructor will try to keep the
 * connection open (alive) between consecutive requests if the alwaysClose
 * parameter is set to <tt>false</tt>. Otherwise the connection manager will
 * always close connections upon release.
 *
 * @param alwaysClose if set <tt>true</tt>, the connection manager will always
 *    close connections upon release.
 */
public SimpleHttpConnectionManager(boolean alwaysClose) {
    super();
    this.alwaysClose = alwaysClose;
}

Note that we can look at ways to see if the link to the release of alwaysClose set to true in the connection manager will be turned off after the chain. In our HttpClient client = new HttpClient () this instance of a client this is the case when the connection manager has been instantiated

this.httpConnectionManager = new SimpleHttpConnectionManager();

Therefore, alwaysClose default is false, connection will not be actively shut down, so we have a close link to a client-side approach.
Method 1:
The case in the code to instantiate the first line of code can be changed as follows, in the method.releaseConnection (); after the connection manager will close the connection.

HttpClient client = new HttpClient(new HttpClientParams(),new SimpleHttpConnectionManager(true) );

Method 2:
Instantiated code uses: HttpClient client = new HttpClient ();
In the method.releaseConnection (); after

((SimpleHttpConnectionManager)client.getHttpConnectionManager()).shutdown();

shutdown the source code is very simple, read at a glance

public void shutdown() {
    httpConnection.close();
}

Method 3:
Instantiated code uses: HttpClient client = new HttpClient ();
In the method.releaseConnection (); after
client.getHttpConnectionManager (). closeIdleConnections (0); this method source code is as follows:

public void closeIdleConnections(long idleTimeout) {
    long maxIdleTime = System.currentTimeMillis() - idleTimeout;
    if (idleStartTime <= maxIdleTime) {
        httpConnection.close();
    }
}

IdleTimeout set to 0 to ensure that the link be closed.
The above three methods are all clients take the initiative to close the TCP link approach. The following description is from the server-side and then automatically close link approach.
Method 4:
Code to achieve is very simple, all the code and the most example code is the same as above. Only need to HttpMethod method = new GetMethod ( "http://www.apache.org"); add a line to set HTTP header can be

method.setRequestHeader("Connection", "close");

Look at the HTTP protocol on this attribute definition:
HTTP/1.1 defines the "close" connection option for the sender to signal that the connection will be closed after completion of the response. For example,
Connection: close
Now talk about is that the client close the link and close link to the server-side difference. If you close links with the client method, the client machine using the netstat-an command of the TCP link to see a lot of TIME_WAIT. If the server-side take the initiative to close the link to it in the case appeared on the server side.
The instructions on the http://wiki.apache.org/HttpComponents/FrequentlyAskedConnectionManagementQuestions reference WIKI
The TIME_WAIT state is a protection mechanism in TCP. The side that closes a socket connection orderly will keep the connection in state TIME_WAIT for some time, typically between 1 and 4 minutes.
TIME_WAIT state will appear in this end take the initiative to close the link. TCP protocol in the TIME_WAIT state is mainly to ensure the integrity of the data transfer. Specifically can refer to this document:
http://www.softlab.ntua.gr/facilities/documentation/unix/unix-socket-faq/unix-socket-faq-2.html # ss2.7
In addition to stress using the above method of closure of these links in our application to know exactly when you do not need to re-link you can take the initiative to close the link to free resources. If your application is no need to re-link if necessary to do so, use the original link can also provide performance.

【中文翻译】《HttpClient容易忽视的细节——连接关闭》
{URL:[http://www.iteye.com/topic/234759}

Java代码  

HttpClient client = new HttpClient();  

HttpMethod method = new GetMethod("http://www.apache.org");  

try {  

  client.executeMethod(method);  

  byte[] responseBody = null;  

  responseBody = method.getResponseBody();  

} catch (HttpException e) {  

  // TODO Auto-generated catch block  

  e.printStackTrace();  

} catch (IOException e) {  

  // TODO Auto-generated catch block  

  e.printStackTrace();  

}finally{  

  method.releaseConnection();  

}

大部分人使用HttpClient都是使用类似上面的事例代码，包括Apache官方的例子也是如此。最近我在使用HttpClient是发现一次循环发送大量请求到服务器会导致APACHE服务器的链接被占满，后续的请求便排队等待。
我服务器端APACHE的配置

Java代码  

Timeout 30  

KeepAlive On   #表示服务器端不会主动关闭链接  

MaxKeepAliveRequests 100  

KeepAliveTimeout 180

因此这样的配置就会导致每个链接至少要过180S才会被释放，这样在大量请求访问时就必然会造成链接被占满，请求等待的情况。
在通过DEBUH后发现HttpClient在method.releaseConnection()后并没有把链接关闭，这个方法只是将链接返回给connection manager。如果使用HttpClient client = new HttpClient()实例化一个HttpClient connection manager默认实现是使用SimpleHttpConnectionManager。SimpleHttpConnectionManager有个构造函数如下

Java代码  

/** 

 * The connection manager created with this constructor will try to keep the  

 * connection open (alive) between consecutive requests if the alwaysClose  

 * parameter is set to <tt>false</tt>. Otherwise the connection manager will  

 * always close connections upon release. 

 *  

 * @param alwaysClose if set <tt>true</tt>, the connection manager will always 

 *    close connections upon release. 

 */  

public SimpleHttpConnectionManager(boolean alwaysClose) {  

    super();  

    this.alwaysClose = alwaysClose;  

}

看方法注释我们就可以看到如果alwaysClose设为true在链接释放之后connection manager 就会关闭链。在我们HttpClient client = new HttpClient()这样实例化一个client时connection manager是这样被实例化的

Java代码  

this.httpConnectionManager = new SimpleHttpConnectionManager();  

因此alwaysClose默认是false,connection是不会被主动关闭的，因此我们就有了一个客户端关闭链接的方法。
方法一：
把事例代码中的第一行实例化代码改为如下即可，在method.releaseConnection();之后connection manager会关闭connection 。

Java代码  

HttpClient client = new HttpClient(new HttpClientParams(),new SimpleHttpConnectionManager(true) );  

方法二：
实例化代码使用：HttpClient client = new HttpClient();
在method.releaseConnection();之后加上

Java代码  

((SimpleHttpConnectionManager)client.getHttpConnectionManager()).shutdown();  

shutdown源代码很简单，看了一目了然

Java代码  

public void shutdown() {  

    httpConnection.close();  

}

方法三：
实例化代码使用：HttpClient client = new HttpClient();
在method.releaseConnection();之后加上
client.getHttpConnectionManager().closeIdleConnections(0);此方法源码代码如下：

Java代码  

public void closeIdleConnections(long idleTimeout) {  

    long maxIdleTime = System.currentTimeMillis() - idleTimeout;  

    if (idleStartTime <= maxIdleTime) {  

        httpConnection.close();  

    }  

}

将idleTimeout设为0可以确保链接被关闭。
以上这三种方法都是有客户端主动关闭TCP链接的方法。下面再介绍由服务器端自动关闭链接的方法。
方法四：
代码实现很简单，所有代码就和最上面的事例代码一样。只需要在HttpMethod method = new GetMethod("http://www.apache.org");加上一行HTTP头的设置即可

Java代码  

method.setRequestHeader("Connection", "close");  

看一下HTTP协议中关于这个属性的定义：
HTTP/1.1 defines the "close" connection option for the sender to signal that the connection will be closed after completion of the response. For example,
Connection: close
现在再说一下客户端关闭链接和服务器端关闭链接的区别。如果采用客户端关闭链接的方法，在客户端的机器上使用netstat –an命令会看到很多TIME_WAIT的TCP链接。如果服务器端主动关闭链接这中情况就出现在服务器端。
参考WIKI上的说明http://wiki.apache.org/HttpComponents/FrequentlyAskedConnectionManagementQuestions
The TIME_WAIT state is a protection mechanism in TCP. The side that closes a socket connection orderly will keep the connection in state TIME_WAIT for some time, typically between 1 and 4 minutes.
TIME_WAIT的状态会出现在主动关闭链接的这一端。TCP协议中TIME_WAIT状态主要是为了保证数据的完整传输。

具体可以参考此文档：
http://www.softlab.ntua.gr/facilities/documentation/unix/unix-socket-faq/unix-socket-faq-2.html#ss2.7
另外强调一下使用上面这些方法关闭链接是在我们的应用中明确知道不需要重用链接时可以主动关闭链接来释放资源。如果你的应用是需要重用链接的话就没必要这么做，使用原有的链接还可以提供性能。

【高性能HTTP应用的策略】

所以，当我们需要一个高性能的HTTP接口型应用时：

1，服务器端：关闭KeepAlive功能。

2、服务器端：最好直接支持HTTP协议（注意用POST，不要GET）,而不是任何包装过的协议，比如：hessian/soap等。

3、服务器端：在一个请求中，最好设计成：支持多条指令批处理，以节省连接数。

4、服务器端：对请求的处理应当尽可能的快（如在150ms内）。

5、客户端：在代码中，同一个客户端实例中全部请求结束后应主动关闭连接（无须事先设置客户端的ConnectionTimeOut参数）。

6、客户端：如服务器未关闭KeepAlive，在同一个客户端实例中可以适量发出多个请求（总时间应稍小于服务器KeepAliveTimeout参数）。此方式需要精确操作，不推荐。

最后，在接口设计上，对于一些异步操作，尽量不要设计成单方面轮询模式（减少大量无谓请求数），应设计成被调用方的异步结果回调模式。

【一些优化细节】
在服务器端，我们一般选用的是Apache+Tomcat/JBoss的组合。关于JBoss的配置及优化可参看JBoss官网。

在客户端的Java代码中，我们最常使用的是HttpClient工具包。
有一些细节要注意：
1、在每一个HttpClient实例发完请求后，（如不再使用）应及时关闭连接。
最简单的方式是，在HTTP Request Header中发送(Connection: close)，指示服务器关闭当前连接。
代码如下：
method.setRequestHeader("Connection", "close");
2、可以设计为单例模式：无需每次创建HttpClient实例，可多次发送请求（请求头设置见第一条）

老外有一篇文章，很好的描述了类似代码的性能隐患：《HttpClient容易忽视的细节——连接关闭》

1、英文原文：http://www.codeweblog.com/httpclient-s-easy-to-overlook-the-details-the-connection-is-closed/

2，中文翻译：http://www.iteye.com/topic/234759