关于TIME_WAIT可能引发的网络问题

统计？

shell> netstat -nt | awk '/^tcp/ {++state[$NF]} END {for(key in state) print key,"\t",state[key]}'

如何减少？

网络上已经有不少相关的介绍，大多是建议：

shell> sysctl net.ipv4.tcp_tw_reuse=1

shell> sysctl net.ipv4.tcp_tw_recycle=1

这两个选项在降低TIME_WAIT数量方面可以说是立竿见影，不过如果你觉得问题已经完美搞定那就错了，实际上这样可能会引入一个更复杂的网络故障。

关于内核参数的详细介绍，可以参考官方文档。我们这里简要说明一下tcp_tw_recycle参数。它用来快速回收TIME_WAIT连接，不过如果在NAT环境下会引发问题。

RFC1323中有如下一段描述：

An additional mechanism could be added to the TCP, a per-host cache of the last timestamp received from any connection. This value could then be used in the PAWS mechanism to reject old duplicate segments from earlier incarnations of the connection, if the timestamp clock can be guaranteed to have ticked at least once since the old connection was open. This would require that the TIME-WAIT delay plus the RTT together must be at least one tick of the sender’s timestamp clock. Such an extension is not part of the proposal of this RFC.

大概意思是说TCP有一种行为，可以缓存每个连接最新的时间戳，后续请求中如果时间戳小于缓存的时间戳，即视为无效，相应的数据包会被丢弃。

Linux是否启用这种行为取决于tcp_timestamps和tcp_tw_recycle，因为tcp_timestamps缺省就是开启的，所以当tcp_tw_recycle被开启后，实际上这种行为就被激活了。

现在很多公司都用LVS做负载均衡，通常是前面一台LVS，后面多台后端服务器，以NAT方式构建，当请求到达LVS后，它修改地址数据后便转发给后端服务器，但不会修改时间戳数据，对于后端服务器来说，请求的源地址就是LVS的地址，加上端口会复用，所以从后端服务器的角度看，原本不同客户端的请求经过LVS的转发，就可能会被认为是同一个连接，加之不同客户端的时间可能不一致，所以就会出现时间戳错乱的现象，于是后面的数据包就被丢弃了，具体的表现通常是是客户端明明发送的SYN，但服务端就是不响应ACK，还可以通过下面命令来确认数据包不断被丢弃的现象：

shell> netstat -s | grep timestamp

... packets rejects in established connections because of timestamp

如果服务器身处NAT环境，安全起见，通常要禁止tcp_tw_recycle。说到这里，大家可能会想到另一种解决方案：把tcp_timestamps设置为0，tcp_tw_recycle设置为1，这样不就可以鱼与熊掌兼得了么，实际并不是这么简单，因为一旦关闭了tcp_timestamps，那么即便打开了tcp_tw_recycle，也没有效果。

好在我们还有另一个内核参数tcp_max_tw_buckets（一般缺省是180000）可用：

shell> sysctl net.ipv4.tcp_max_tw_buckets=10000

通过设置它，系统会将多余的TIME_WAIT删除掉，此时系统日志里可能会显示：『TCP: time wait bucket table overflow』，多数情况下不用在意这些信息。完全禁止TIME_WAIT是不可取的，毕竟它是TCP的一部分。

原文

http://huoding.com/2012/01/19/142