Java中String的hash函数分析

转载自：http://blog.csdn.net/hengyunabc/article/details/7198533

JDK6的源码：

[java]view
 plaincopy

   /** 

    * Returns a hash code for this string. The hash code for a 

    * <code>String</code> object is computed as 

    * <blockquote><pre> 

    * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] 

    * </pre></blockquote> 

    * using <code>int</code> arithmetic, where <code>s[i]</code> is the 

    * <i>i</i>th character of the string, <code>n</code> is the length of 

    * the string, and <code>^</code> indicates exponentiation. 

    * (The hash value of the empty string is zero.) 

    * 

    * @return  a hash code value for this object. 

    */  

   public int hashCode() {  

int h = hash;  

if (h == 0) {  

    int off = offset;  

    char val[] = value;  

    int len = count;  

           for (int i = 0; i < len; i++) {  

               h = 31*h + val[off++];  

           }  

           hash = h;  

       }  

       return h;  

   }

以字符串"123"为例：

字符'1'的ascii码是49

hashCode = （49*31 + 50）*31 + 51

或者这样看：

hashCode=（'1' * 31 + '2' ） * 31 + '3'

可见实际可以看作是一种权重的算法，在前面的字符的权重大。

这样有个明显的好处，就是前缀相同的字符串的hash值都落在邻近的区间。

好处有两点：

1.可以节省内存，因为hash值在相邻，这样hash的数组可以比较小。比如当用HashMap，以String为key时。

2.hash值相邻，如果存放在容器，比好HashSet，HashMap中时，实际存放的内存的位置也相邻，则存取的效率也高。（程序局部性原理）

以31为倍数，原因了31的二进制全是1，则可以有效地离散数据。

最后看下，两个字符串，由Eclipse生成的代码是如何计算hash值的：

[java]view
 plaincopy

public class Name{  

    String firstName;  

    String lastName;  

    @Override  

    public int hashCode() {  

        final int prime = 31;  

        int result = 1;  

        result = prime * result  

                + ((firstName == null) ? 0 : firstName.hashCode());  

        result = prime * result  

                + ((lastName == null) ? 0 : lastName.hashCode());  

        return result;  

    }  

    @Override  

    public boolean equals(Object obj) {  

        if (this == obj)  

            return true;  

        if (obj == null)  

            return false;  

        if (getClass() != obj.getClass())  

            return false;  

        Name other = (Name) obj;  

        if (firstName == null) {  

            if (other.firstName != null)  

                return false;  

        } else if (!firstName.equals(other.firstName))  

            return false;  

        if (lastName == null) {  

            if (other.lastName != null)  

                return false;  

        } else if (!lastName.equals(other.lastName))  

            return false;  

        return true;  

    }     

}

可见，还是以31为倍数， hashCode = firstName.hashCode() * 31 + lastName.hashCode() 。

BTW：Java的字符串的hash做了缓存，第一次才会真正算，以后都是取缓存值。

eclipse生成的equals函数质量也很高，各种情况都考虑到了。

总结：字符串hash函数，不仅要减少冲突，而且要注意相同前缀的字符串生成的hash值要相邻。