20120831 hadoop丢失dn的问题(文件数限制,网络连接限制)

机房搬迁,最近集群老是丢失datanode。

查找原因:

(1)用户进程可打开文件数限制

http://www.54chen.com/java-ee/hive-hadoop-blockalreadyexistsexception.html

在/etc/security/limits.conf 添加:

hadoop - nofile 65535

hadoop - nproc  65535

然后用root:ulimit -SHn 65535

(2)网络连接限制修改

http://zbszone.iteye.com/blog/826199

修改网络连接数限制,在/etc/sysctl.conf中添加:

#net.ipv4.tcp_fin_timeout = 30 
#net.ipv4.tcp_keepalive_time = 120 
net.ipv4.ip_local_port_range = 1024  65535 
net.ipv4.ip_conntrack_max = 655360 
net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 180 

使配置立即生效: 
/sbin/sysctl -p 

如果出现error: "net.ipv4.ip_conntrack_max" is an unknown key 
error: "net.ipv4.netfilter.ip_conntrack_tcp_timeout_established" is an unknown key 
解决: 
modprobe ip_conntrack   
echo "modprobe ip_conntrack" >> /etc/rc.local
原文地址:https://www.cnblogs.com/tangtianfly/p/2665156.html