Java源码阅读HashMap

1类签名与注释

public class HashMap<K,V> extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable

HashMap是基于哈希表实现的Map接口。此实现提供了所有可选的地图操作，并允许null的值和null键。（ HashMap类大致相当于Hashtable ，除了它是不同步的，并允许null）。该类不能保证map的顺序，特别是，它不能保证顺序在一段时间内保持不变（因为hash数组扩容时会重新散列）。

假设哈希函数在这些存储桶之间合理分散元素，这个实现为基本操作（ get和put ）提供了恒定的时间性能。收集视图的迭代与HashMap实例（桶数）加上其大小（键值映射数）的容量成正比。因此，想要好的迭代性能，不要将初始容量设置得太高（或负载因子太低）。

HashMap的一个实例有两个影响其性能的参数：初始容量（capacity）和负载因子（load factor）。容量是哈希表中的桶数，初始容量只是创建哈希表时的容量。负载因子是在容量自动增加之前允许哈希表得到满足的度量。当在散列表中的条目的数量超过了负载因数和当前容量的乘积，哈希表被重新散列（即，内部数据结构被重建），使得哈希表具有桶的大约两倍。

作为一般规则，默认负载因子（0.75）提供了时间和空间成本之间的良好折中。更高的值会降低空间开销，但会增加查找成本（反映在HashMap类的大部分操作中，包括get和put ）。在设置其初始容量时，应考虑map中预期的条目数及其负载因子，以便最小化rehash操作的数量。如果初始容量大于最大条目数除以负载因子，则不会发生重新排列操作。

如果有许多映射要存储在HashMap实例中，那么创建足够大的容量存储要比它自己自动扩容再rehash的效率高。注意，多个keys使用同样的hashCode会降低hash表的性能。为了改善影响，当按键是Comparable时，这个类可以使用keys之间的比较顺序来帮助打破关系。

请注意，此实现不同步。 如果多个线程同时访问哈希映射，并且至少有一个线程在结构上修改了映射，那么它必须在外部进行同步。（结构修改是添加或删除一个或多个映射的任何操作;仅改变与实例已经包含的密钥相关联的值不是结构修改。）这可以通过同步对象来实现，如果没有同步对象，map应该使用Collections.synchronizedMap方法“包装”。这最好在创建时完成，以防止意外的不同步访问map：

Map m = Collections.synchronizedMap(new HashMap(...));

被该类的所有集合视图方法返回的迭代器都是fail-fast的：如果映射在迭代器创建之后的任何时间被结构地修改，除了通过迭代器自己的remove方法之外，迭代器将抛出一个ConcurrentModificationException 。因此，面对并发修改，迭代器将快速而干净地失败，而不是在未来未确定的时间冒着任意的非确定性行为。

请注意，迭代器的故障快速行为无法保证，迭代器的故障快速行为应仅用于检测错误，而不是依赖它来保证程序的正确性。

该类是Java Collections Framework的成员。

2成员变量

// 序列号
private static final long serialVersionUID = 362498820763181265L;
// 默认初始化容量16，要求必须是2的n次方。
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
// 最大容量，因为必须是2的n次方，int中最大只能到2³⁰
static final int MAXIMUM_CAPACITY = 1 << 30;
// 默认负载因子
static final float DEFAULT_LOAD_FACTOR = 0.75f;

//当桶中的键值对数量到达8个，且桶数量大于等于64，则将底层实现从链表转为红黑
// 如果桶中的键值对到达该阀值，则检测桶数量  
static final int TREEIFY_THRESHOLD = 8;

static final int UNTREEIFY_THRESHOLD = 6;
//当桶数（底层数组长度）到达64个的时候，则将链表转为红黑树
static final int MIN_TREEIFY_CAPACITY = 64;

-------------------------------------------------------------------------------

transient Node<K,V>[] table;

//Holds cached entrySet(). Note that AbstractMap fields are used for keySet() and values().
transient Set<Map.Entry<K,V>> entrySet;

//The number of key-value mappings contained in this map. 
transient int size;

transient int modCount;

//当集合没有分配空间时，表示初始的数组容量。当分配了，则表示（capacity * load factor）
//注意理解这个属性，否则后面的代码不好理解
int threshold;

//The load factor for the hash table.
final float loadFactor;

存储桶table是用Node型的数组表示的，Node是一个静态内部类，实现了Map.Entry接口。这里将每个Entity封装成一个链表的节点，每个节点指向下一个节点（如果有的话）。

所以hashMap是用数组和链表实现的。（当当达到一定条件后，由“数组/链表”变为“数组/红黑树”实现）

static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

3构造函数

 //（1）指定初始容量initialCapacity与负载因子loadFactor
 public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY) //初始容量最大值
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }
 //该方法可以将参数cap值改为2ⁿ，2^n-1<cap≤2ⁿ
 static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

 //（2）指定初始容量，负载因子默认0.75
 public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

 //（3）默认初始容量16（在put时会检查扩容），负载因子默认0.75
 public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

 public HashMap(Map<? extends K, ? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        putMapEntries(m, false);
    }
 final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
        int s = m.size();
        if (s > 0) {
            if (table == null) { // pre-size
                float ft = ((float)s / loadFactor) + 1.0F;
                int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                         (int)ft : MAXIMUM_CAPACITY);
                if (t > threshold)
                    threshold = tableSizeFor(t);
            }
            else if (s > threshold)
                resize();
            for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
                K key = e.getKey();
                V value = e.getValue();
                putVal(hash(key), key, value, false, evict);
            }
        }
    }

tableSizeFor(int cap)方法使用无符号右移和或运算，实现了将cap参数转为2ⁿ的整数。思路如下：以00001000（8）为例，从左往右找到第一个1，先>>>1得（00000100）然后或操作原数(00001000)得（00001100），将原数第一个1的下一位变成1。下一次>>>2或之后得15（00001111），也就是把最高位1开始之后的所有位都变成1，最后加1就变成了2ⁿ。因为int是32位得所以最多移16位即可。为什么要先加1，最后再减1？这样可以保证2^n-1<cap≤2ⁿ，否则当cap=8时，返回得结果为16都是2的n次方，没有什么意义。

也可以通过1个map类型的参数构造HashMap，构造函数通过调用putMapEntries实现。putMapEntries先检查一下容量，然后通过putVal插入元素。关于putVal方法，在后面小节详细说明。

4查找

 1 public V get(Object key) {
 2         Node<K,V> e;
 3         return (e = getNode(hash(key), key)) == null ? null : e.value;
 4     }
 5 
 6     final Node<K,V> getNode(int hash, Object key) {
 7         Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
 8         if ((tab = table) != null && (n = tab.length) > 0 &&
 9             (first = tab[(n - 1) & hash]) != null) {
10             if (first.hash == hash && // always check first node
11                 ((k = first.key) == key || (key != null && key.equals(k))))
12                 return first;
13             if ((e = first.next) != null) {
14                 if (first instanceof TreeNode)
15                     return ((TreeNode<K,V>)first).getTreeNode(hash, key);
16                 do {
17                     if (e.hash == hash &&
18                         ((k = e.key) == key || (key != null && key.equals(k))))
19                         return e;
20                 } while ((e = e.next) != null);
21             }
22         }
23         return null;
24     }
25 
26     public boolean containsKey(Object key) {
27         return getNode(hash(key), key) != null;
28     }

get方法和containsKey都是通过getNode实现。

（1）如何通过key的hash值找到key在数组中的位置？

注意代码line9，first = tab[(n - 1) & hash]，前面我们知道HashMap的容量是2的指数倍的，所以n-1可以保证低位全部都是1，例如n=16，n-1=1（00001111）。而(n - 1) & hash可以将hash值得高位置0，相当于hash%n，但计算速度比后者要快。

（2）找到该位置的对应节点

first表示该位置的第一个节点，当找到该位置时，总是要先检查一下first节点是不是查找的元素（line10-12）。若不是则往后遍历链表。（为什么要判断TreeNode？）

（3）hash(key)实现的原理

 static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

若非空，则高16位不变，低16位变成高16位和低16位的异或。为什么要这么做？前面我们知道定位是通过(length - 1) & hash，当length不够大时（也就是hashMap容量不够大），一直是hash值的低位起作用，这样容易造成碰撞（不同的hash值定位的结果相同），所以要提高高位的影响。然后就有了该操作。

注意key为null的情况，hashMap是允许key为null的，key为null的entry（键值对）存储在存储数组的0号位。

也可以查找value是否在集合中

 public boolean containsValue(Object value) {
        Node<K,V>[] tab; V v;
        if ((tab = table) != null && size > 0) {
            for (int i = 0; i < tab.length; ++i) {
                for (Node<K,V> e = tab[i]; e != null; e = e.next) {
                    if ((v = e.value) == value ||
                        (value != null && value.equals(v)))
                        return true;
                }
            }
        }
        return false;
    }

这里总结两个源码实现的小技巧：

（1）在遍历数组前先检查数组是否为空。((tab = table) != null && size > 0)

（2）判断对象是否相等的方式。((v = e.value) == value || (value != null && value.equals(v)))

5扩容

resize方法

 1 final Node<K,V>[] resize() {
 2         Node<K,V>[] oldTab = table;
 3         int oldCap = (oldTab == null) ? 0 : oldTab.length;
 4         int oldThr = threshold;
 5         int newCap, newThr = 0;
 6         if (oldCap > 0) { // 容量不为空（已分配内存空间）
 7             if (oldCap >= MAXIMUM_CAPACITY) {
 8                 threshold = Integer.MAX_VALUE;
 9                 return oldTab; //已到最大值，没法继续扩容
10             }
11             else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
12                      oldCap >= DEFAULT_INITIAL_CAPACITY) //容量增为2倍，并检查
13                 newThr = oldThr << 1; // 将阈值也扩为2倍
14         }
15         else if (oldThr > 0) // 若容量为0，old阈值大于0。容量用阈值表示（见构造函数1）
16             newCap = oldThr;
17         else {               // 若容量为0，old阈值也为0。使用默认值初始化（见构造函数3）
18             newCap = DEFAULT_INITIAL_CAPACITY;
19             newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
20         }
21         if (newThr == 0) { //计算阈值的合理值
22             float ft = (float)newCap * loadFactor;
23             newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
24                       (int)ft : Integer.MAX_VALUE);
25         }
26         threshold = newThr;
27         @SuppressWarnings({"rawtypes","unchecked"})
28             Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
29         table = newTab;
30         if (oldTab != null) { //将旧的值复制到新的存储桶里面
31             for (int j = 0; j < oldCap; ++j) {
32                 Node<K,V> e;
33                 if ((e = oldTab[j]) != null) {
34                     oldTab[j] = null;
35                     if (e.next == null) //如果该位置只有1个元素，直接插如到新的位置（从新计算位置）
36                         newTab[e.hash & (newCap - 1)] = e;
37                     else if (e instanceof TreeNode)
38                         ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
39                     else { // preserve order
40                         Node<K,V> loHead = null, loTail = null;
41                         Node<K,V> hiHead = null, hiTail = null;
42                         Node<K,V> next;
43                         do {
44                             next = e.next;
45                             if ((e.hash & oldCap) == 0) {
46                                 if (loTail == null)
47                                     loHead = e;  // 头节点指向第一个元素
48                                 else
49                                     loTail.next = e; // 当前个元素的next指向下一个元素
50                                 loTail = e; //尾节点移向下一个个元素
51                             }
52                             else {
53                                 if (hiTail == null)
54                                     hiHead = e;
55                                 else
56                                     hiTail.next = e;
57                                 hiTail = e;
58                             }
59                         } while ((e = next) != null);
60                         if (loTail != null) {
61                             loTail.next = null;
62                             newTab[j] = loHead;
63                         }
64                         if (hiTail != null) {
65                             hiTail.next = null;
66                             newTab[j + oldCap] = hiHead;
67                         }
68                     }
69                 }
70             }
71         }
72         return newTab;
73     }

下面重点分析line40-66，这里的任务是处理oldTab[j]位置的值不是单个元素，而是由多个元素组成链表的情况。

原理是通过头节点Head定位第一个元素，通过尾节点Tail的不断后移组装链表。但是为什么这里要使用两组头尾节点呢？（loHead+loTail、hiHead+hiTail）

这里是定位用的。举个例子原集合容量为16（0001 0000），(e.hash & oldCap) == 0表示hash值的第5位（从右往左）为0，这样扩容后定位为e.hash & （newCap-1）= e.hash &31（0001 1111），计算后第5位也为0，与旧集合的位置一样。所以line62直接将链表存在和同样的位置上。否则hash值的第5位为1，定位计算后第5位也为0，与原来相比大了2^（5-1）=16，正好是大了旧集合的容量，所以line66定位用j+oldCap。可以把容量为32的新集合简单理解为高16位和低16位，结合取模计算就很好理解了。

line37-38在后面讨论红黑树的时候解释。

6添加元素

通过调用put方法向hashMap中添加1个元素

 1 public V put(K key, V value) {
 2         return putVal(hash(key), key, value, false, true);
 3     }
 4 
 5     /**
 6      * @param hash hash for key
 7      * @param key the key
 8      * @param value the value to put
 9      * @param onlyIfAbsent if true, don't change existing value
10      * @param evict if false, the table is in creation mode.
11      * @return previous value, or null if none
12      */
13     final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
14                    boolean evict) {
15         Node<K,V>[] tab; Node<K,V> p; int n, i;
16         if ((tab = table) == null || (n = tab.length) == 0)
17             n = (tab = resize()).length;
18         if ((p = tab[i = (n - 1) & hash]) == null)
19             tab[i] = newNode(hash, key, value, null);
20         else {
21             Node<K,V> e; K k;
22             if (p.hash == hash &&
23                 ((k = p.key) == key || (key != null && key.equals(k))))
24                 e = p;
25             else if (p instanceof TreeNode)
26                 e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
27             else {
28                 for (int binCount = 0; ; ++binCount) {
29                     if ((e = p.next) == null) {
30                         p.next = newNode(hash, key, value, null);
31                         if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
32                             treeifyBin(tab, hash);
33                         break;
34                     }
35                     if (e.hash == hash &&
36                         ((k = e.key) == key || (key != null && key.equals(k))))
37                         break;
38                     p = e;
39                 }
40             }
41             if (e != null) { // existing mapping for key
42                 V oldValue = e.value;
43                 if (!onlyIfAbsent || oldValue == null)
44                     e.value = value;
45                 afterNodeAccess(e);
46                 return oldValue;
47             }
48         }
49         ++modCount;
50         if (++size > threshold)
51             resize();
52         afterNodeInsertion(evict);
53         return null;
54     }

line16-17检查集合是否为空，若是则分配初始容量。

line18-19根据hash值进行定位，检查存储桶在该位置i是否为空。若是则直接放入新的值。（newNode方法新建1个Node类型的对象）。

若存储桶在i位置有元素

（1）首先判断key值是否和链表的第一个元素相等（line22）

（2）若相等则如果参数onlyIfAbsent为false或者该位置的值为null，将value赋值为新值，并返回旧值（line41-47）。

（3）若1步不相等，则检查是否是TreeNode类型（红黑树实现）

（4）在链表的尾部插入新的元素（Node类型）

line25-26中的treeNode后面再研究。

最后判断当前容量是否超过阈值，若超过就要进行扩容。line50-51

afterNodeInsertion是一个post-actions，为了方便LinkedHashMap插入节点后的行为。但是在HashMap里面是该方法没有任何行为。

注意：line31检查桶中的节点数量（链表的长度），大于阀值则调用treeifyBin方法，treeifyBin中检查桶数量，大于64则需要将底层实现由链表改为红黑树，如果桶数量不到64则重构一下链表。（jdk8中做的优化）

//当桶的数量太多的时候，底层则改为红黑树实现  
final void treeifyBin(Node<K,V>[] tab, int hash) {  
    int n, index; Node<K,V> e;  
    //当桶的数量有 MIN_TREEIFY_CAPACITY （64）个时才将链表改为红黑树  
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)  
        //节点太少则resize()  
        resize();  
    else if ((e = tab[index = (n - 1) & hash]) != null) {  
        //将链接的结点转为二叉树结构  
        TreeNode<K,V> hd = null, tl = null;  
        do {  
            TreeNode<K,V> p = replacementTreeNode(e, null);  
            if (tl == null)  
                hd = p;  
            else {  
                p.prev = tl;  
                tl.next = p;  
            }  
            tl = p;  
        } while ((e = e.next) != null);  
        if ((tab[index] = hd) != null)  
            hd.treeify(tab);  
    }  
}

7删除元素

public V remove(Object key) {
        Node<K,V> e;
        return (e = removeNode(hash(key), key, null, false, true)) == null ?
            null : e.value;
    }

    /**
     * @param hash hash for key
     * @param key the key
     * @param value the value to match if matchValue, else ignored
     * @param matchValue if true only remove if value is equal
     * @param movable if false do not move other nodes while removing
     * @return the node, or null if none
     */
    final Node<K,V> removeNode(int hash, Object key, Object value,
                               boolean matchValue, boolean movable) {
        Node<K,V>[] tab; Node<K,V> p; int n, index;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (p = tab[index = (n - 1) & hash]) != null) {
            Node<K,V> node = null, e; K k; V v;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                node = p;
            else if ((e = p.next) != null) {
                if (p instanceof TreeNode)
                    node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
                else {
                    do {
                        if (e.hash == hash &&
                            ((k = e.key) == key ||
                             (key != null && key.equals(k)))) {
                            node = e;
                            break;
                        }
                        p = e;
                    } while ((e = e.next) != null);
                }
            }
            if (node != null && (!matchValue || (v = node.value) == value ||
                                 (value != null && value.equals(v)))) {
                if (node instanceof TreeNode)
                    ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
                else if (node == p)
                    tab[index] = node.next;
                else
                    p.next = node.next;
                ++modCount;
                --size;
                afterNodeRemoval(node);
                return node;
            }
        }
        return null;
    }

删除所有元素

 public void clear() {
        Node<K,V>[] tab;
        modCount++;
        if ((tab = table) != null && size > 0) {
            size = 0;
            for (int i = 0; i < tab.length; ++i)
                tab[i] = null;
        }
    }

8迭代器

hashMap内部实现了四个迭代器，分别是HashIterator，KeyIterator，ValueIterator，EntryIterator。其中HashIterator是其他迭代器的父类。

HashIterator实现如下：

 abstract class HashIterator {
        Node<K,V> next;        // next entry to return
        Node<K,V> current;     // current entry
        int expectedModCount;  // for fast-fail
        int index;             // current slot

        HashIterator() {
            expectedModCount = modCount;
            Node<K,V>[] t = table;
            current = next = null;
            index = 0;
            if (t != null && size > 0) { // 找到数组第一个不为null的位置
                do {} while (index < t.length && (next = t[index++]) == null);
            }
        }

        public final boolean hasNext() {
            return next != null;
        }

        final Node<K,V> nextNode() {
            Node<K,V>[] t;
            Node<K,V> e = next;
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            if (e == null)
                throw new NoSuchElementException();
            if ((next = (current = e).next) == null && (t = table) != null) { // 这里不仅判断，而且将next节点后移。
                do {} while (index < t.length && (next = t[index++]) == null); // 找到数组下一个不为null的位置
            }
            return e;
        }

        public final void remove() {
            Node<K,V> p = current;
            if (p == null)
                throw new IllegalStateException();
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            current = null;  //要求remove之前的操作是nextNode
            K key = p.key;
            removeNode(hash(key), key, null, false, false);
            expectedModCount = modCount;
        }
    }

KeyIterator是keySet的迭代器实现

ValueIterator是valuse的迭代器实现

EntryIterator是Entry的迭代器实现

final class KeyIterator extends HashIterator
        implements Iterator<K> {
        public final K next() { return nextNode().key; }
    }

    final class ValueIterator extends HashIterator
        implements Iterator<V> {
        public final V next() { return nextNode().value; }
    }

    final class EntryIterator extends HashIterator
        implements Iterator<Map.Entry<K,V>> {
        public final Map.Entry<K,V> next() { return nextNode(); }
    }

本文只分析了hashMap的一些基本操作的实现，其他的方法（包括函数编程内容，红黑树的实现等）后续会深入学习。