集合类学习之Hashmap机制研究

1、遍历的两种实现方法

//新建

Map map=new HashMap();

//存储值

map.put(）

//遍历方式一  
   
Iterator iterator=map.entrySet().Iterator();  
   
while(iterator.hasNext())  
   
{     
   
Map.Entry entry=(Map.Entry )iterator.next();  
   
Objcet key=entry.getKey();  
   
Objcet value=entry.getValue();  
   
}

//遍历方式二

1

2

3

4

5

Iterator iter = map.keySet().iterator();   

while (iter.hasNext()) {   

    Object key = iter.next();   

    Object val = map.get(key);   

}

2、原理研究

1、The HashMap class is roughly equivalent to <tt>Hashtable</tt>, except that it is unsynchronized and permits nulls.

2、An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed that is, internal data structures are rebuilt，so that the hash table has approximately twice the number of buckets.加载因子默认为0.75，较好的平衡了时间和空间开销

3、hashmap是非同步和允许null值的，对于多线程情况下，最好初始化时，如下定义：

Map m = Collections.synchronizedMap(new HashMap(...));

4、源码分析

构造方法：

 /** 

   * The default initial capacity - MUST be a power of two.初始大小为16，必须为2的幂形式 

   */  

  static final int DEFAULT_INITIAL_CAPACITY = 16;  

 /** 

   * The maximum capacity, used if a higher value is implicitly specified 

   * by either of the constructors with arguments. 

   * MUST be a power of two <= 1<<30.最大为2的30次幂 

   */  

  static final int MAXIMUM_CAPACITY = 1 << 30;  

  /** 

   * The next size value at which to resize (capacity * load factor). 

   * @serial 

   */  

  int threshold;  

  /** 

   * The table, resized as necessary. Length MUST Always be a power of two. 

   */  

  transient Entry[] table;//存储的容器其实是个容量为capacity的数组  

//构造方法  

public HashMap(int initialCapacity, float loadFactor) {  

      if (initialCapacity < 0)  

          throw new IllegalArgumentException("Illegal initial capacity: " +  

                                             initialCapacity);  

      if (initialCapacity > MAXIMUM_CAPACITY)  

          initialCapacity = MAXIMUM_CAPACITY;  

      if (loadFactor <= 0 || Float.isNaN(loadFactor))  

          throw new IllegalArgumentException("Illegal load factor: " +  

                                             loadFactor);  

      // Find a power of 2 >= initialCapacity  

      int capacity = 1;  

      while (capacity < initialCapacity)  

          capacity <<= 1;  

      this.loadFactor = loadFactor;  

      threshold = (int)(capacity * loadFactor);  

      table = new Entry[capacity];/  /hashmap内部实现为Entry  

      init();

对于任意给定的对象，只要它的 hashCode() 返回值相同，那么程序调用 hash(int h) 方法所计算得到的 Hash 码值总是相同的。

 public V put(K key, V value)   

 {   

     // 如果 key 为 null，调用 putForNullKey 方法进行处理  

     if (key == null)   

         return putForNullKey(value);   

     // 根据 key 的 keyCode 计算 Hash 值  

     int hash = hash(key.hashCode());   

     // 搜索指定 hash 值在对应 table 中的索引  

     int i = indexFor(hash, table.length);  

     // 如果 i 索引处的 Entry 不为 null，通过循环不断遍历 e 元素的下一个元素  

     for (Entry<K,V> e = table[i]; e != null; e = e.next)   

     {   

         Object k;   

         // 找到指定 key 与需要放入的 key 相等（hash 值相同  

         // 通过 equals 比较放回 true）  

         if (e.hash == hash && ((k = e.key) == key   

             || key.equals(k)))   

         {   

             V oldValue = e.value;   

             e.value = value;   

             e.recordAccess(this);   

             return oldValue;   

         }   

     }   

     // 如果 i 索引处的 Entry 为 null，表明此处还没有 Entry   

     modCount++;   

     // 将 key、value 添加到 i 索引处  

     addEntry(hash, key, value, i);   

     return null;   

 } </pre><br>

根据上面 put 方法的源代码可以看出，当程序试图将一个 key-value 对放入 HashMap 中时，程序首先根据该 key 的 hashCode() 返回值决定该 Entry 的存储位置：如果两个 Entry 的 key 的 hashCode() 返回值相同，那它们的存储位置相同。如果这两个 Entry 的 key 通过 equals 比较返回 true，新添加 Entry 的 value 将覆盖集合中原有 Entry 的 value，但 key 不会覆盖。//是否会存在key的hashCode相同，而值不相同的情况(equal()比较)如果这两个

Entry 的 key 通过 equals 比较返回 false，新添加的 Entry 将与集合中原有 Entry 形成 Entry 链，而且新添加的 Entry 位于 Entry 链的头部——具体说明继续看 addEntry() 方法的说明。</p>
hashcode和equal均相同，才进行覆写操作当向 HashMap 中添加 key-value 对，由其 key 的 hashCode() 返回值决定该 key-value 对（就是 Entry 对象）的存储位置。当两个 Entry 对象的 key 的 hashCode() 返回值相同时，将由 key 通过 eqauls() 比较值决定是采用覆盖行为（返回 true），还是产生 Entry 链（返回 false）。
对于 HashMap 及其子类而言，它们采用 Hash 算法来决定集合中元素的存储位置。当系统开始初始化 HashMap 时，系统会创建一个长度为 capacity 的 Entry 数组，这个数组里可以存储元素的位置被称为“桶（bucket）”，每个 bucket 都有其指定索引，系统可以根据其索引快速访问该 bucket 里存储的元素。
无论何时，HashMap 的每个“桶”只存储一个元素（也就是一个 Entry），由于 Entry 对象可以包含一个引用变量（就是 Entry 构造器的的最后一个参数）用于指向下一个 Entry，因此可能出现的情况是：HashMap 的 bucket 中只有一个 Entry，但这个 Entry 指向另一个 Entry ——这就形成了一个 Entry 链。如图 1 所示：

</pre>

 public V get(Object key)   

 {   

  // 如果 key 是 null，调用 getForNullKey 取出对应的 value   

  if (key == null)   

   return getForNullKey();   

  // 根据该 key 的 hashCode 值计算它的 hash 码  

  int hash = hash(key.hashCode());   

  // 直接取出 table 数组中指定索引处的值，  

  for (Entry<K,V> e = table[indexFor(hash, table.length)];   

   e != null;   

   // 搜索该 Entry 链的下一个 Entr   

   e = e.next)    // ①  

  {   

   Object k;   

   // 如果该 Entry 的 key 与被搜索 key 相同  

   if (e.hash == hash && ((k = e.key) == key   

    || key.equals(k)))   

    return e.value;   

  }   

  return null;

从上面代码中可以看出，如果 HashMap 的每个 bucket 里只有一个 Entry 时，HashMap 可以根据索引、快速地取出该 bucket 里的 Entry；在发生“Hash 冲突”的情况下，单个 bucket 里存储的不是一个 Entry，而是一个 Entry 链，系统只能必须按顺序遍历每个 Entry，直到找到想搜索的 Entry 为止——如果恰好要搜索的 Entry 位于该 Entry 链的最末端（该 Entry 是最早放入该 bucket 中），那系统必须循环到最后才能找到该元素。归纳起来简单地说，HashMap
在底层将 key-value 当成一个整体进行处理，这个整体就是一个 Entry 对象。HashMap 底层采用一个 Entry[] 数组来保存所有的 key-value 对，当需要存储一个 Entry 对象时，会根据 Hash 算法来决定其存储位置；当需要取出一个 Entry 时，也会根据 Hash 算法找到其存储位置，直接取出该 Entry。</p>
<p>由此可见：HashMap 之所以能快速存、取它所包含的 Entry，完全类似于现实生活中母亲从小教我们的：不同的东西要放在不同的位置，需要时才能快速找到它。当创建 HashMap 时，有一个默认的负载因子（load factor），其默认值为 0.75，这是时间和空间成本上一种折衷：增大负载因子可以减少 Hash 表（就是那个 Entry 数组）所占用的内存空间，但会增加查询数据的时间开销，而查询是最频繁的的操作（HashMap 的 get() 与 put() 方法都要用到查询）；减小负载因子会提高数据查询的性能，但会增加
Hash 表所占用的内存空间。掌握了上面知识之后，我们可以在创建 HashMap 时根据实际需要适当地调整 load factor 的值；如果程序比较关心空间开销、内存比较紧张，可以适当地增加负载因子；如果程序比较关心时间开销，内存比较宽裕则可以适当的减少负载因子。通常情况下，程序员无需改变负载因子的值。如果开始就知道 HashMap 会保存多个 key-value 对，可以在创建时就使用较大的初始化容量，如果 HashMap 中 Entry 的数量一直不会超过极限容量（capacity * load factor），HashMap
就无需调用 resize() 方法重新分配 table 数组，从而保证较好的性能。当然，开始就将初始容量设置太高可能会浪费空间（系统需要创建一个长度为 capacity 的 Entry 数组），因此创建 HashMap 时初始化容量设置也需要小心对待。