JDK1.8 的 HashMap 源码之文件注释


null 插入,key的位置变化,迭代操作时间,性能因素,负载因子,Comparable,加锁,迭代器修改


null 插入,key的位置变化

Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.


HashMap 是基于哈希表实现的 Map 接口,实现了 Map 接口的所以可选操作,并且允许 null 的键和值;

HashTable 很类似,不同点在于是非同步,是线程不安全的,并且运行 null 插入;

HashMap 是不保证插入的顺序的,先插入与后插入的 key 的角标是没有明确的先后关系的,不是先插入的 key 的角标,就一定在前或者在后, 而是完全随机的,有散列算法计算得到;

更特别的是, HashMap key 的角标,在插入到哈希表中也不是固定不变的,在扩容的时候,会进行重新散列,得到新的角标位置 ;

多说一句,刚接触的时候,看到 >>> HashMap 是基于哈希表实现的 Map 接口 <<<< 可能会蒙;

首先要明确的是,哈希表是一种数据结构,大家都可以实现它,在java里面, HashMap 就是哈希表的实现;

说人话,就是 java 里面的 HashMap 实现了 Map 接口, 在实现接口的同时,底层使用哈希思想,实现了哈希表,以达到快速确定映射关系的目的;


迭代操作时间

This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.


HashMap 的基本操作 put get 方法,只需要花费 常数级别 的时间,如果是迭代 HashMap ,花费的时间与 HashMap 的容量和键值对的数量成 线性关系

因此,在性能很重要的情况下,不要将初始容量设置很大,或者将负载因子设置很小 ;


性能因素

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is utomatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.


HashMap 的性能被两个因素左右:容量和扩容因子;

容量即哈希表最多可以存放的数据个数;

扩容因子,在知道扩容因子之前,需要先知道 HashMap 在容量不够的时候,会进行扩容,但是并不是当整个 HashMap 都满了,才进行扩容。而是在当前 HashMap 中的键值对数量,大于 扩容因子和容量的乘积 即进行扩容;因此,负载因子,可以看做是控制哈希表何时扩容的存在 ;

哈希表在扩容的时候,一边扩容到当前容量的两倍 ;


负载因子

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.


负载因子的默认值是 0.75 ,作为一个准则,这个值已经很不错了,在空间与时间上取得一个很好的平衡;如果增大负载因子,空间的浪费势必减少,但是时间的开销则会增大;反之,则空间浪费的很多,频繁触发 rehash 操作,性能堪忧 ;

一般在创建 HashMap 的时候,我们需要明确我们需要放进去的元素最大个数,然后除以 0.75 ,得到初始容量的值,以将 reHash 操作减少到最少;


Comparable

If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same {@code hashCode()} is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are {@link Comparable}, this class may use comparison order among keys to help break ties.


如果有许多键值对需要插入到 HashMap 中,那么一开始初始化就设置足够大的初始容量,是很好的选择,而非选择默认的初始容量(16),让 HashMap自己进行扩容;

有一种很好的方法,来削弱 HashMap 的性能,只要满足插入到 HashMap 中的元素的 hashCode 值都是一样的;为了避免这种情况,我们最好让元素实现 Comparable 接口;


加锁

Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one o the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map.

If no such object exists, the map should be "wrapped" using the {@link Collections#synchronizedMap Collections.synchronizedMap} method. This is best done at creation time, to prevent accidental unsynchronized access to the map: Map m = Collections.synchronizedMap(new HashMap(...));


注意,这个实现不是同步的。如果多个线程同时访问一个 HashMap,并且至少有一个线程从结构上修改了 HashMap ;

修改 HashMap 是指任何增加或删除一个或多个映射的操作;如果是仅更改已经包含的键关联的值,这样不是结构修改;

如果 HashMap 包含在对象里面,那么在对象上加锁 ;

如果是直接使用 HashMap ,则使用下面的方式进行加锁:

Map m = Collections.synchronizedMap(new HashMap(...));

迭代器修改

The iterators returned by all of this class's "collection view methods" are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a {@link ConcurrentModificationException}. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.


集合视图方法返回的迭代器是快速失败类型的(就是 HashMap 的迭代器);当 HashMap 的迭代器被创建以后,如果 HashMap 的结构被修改,除去使用迭代器自己的 remove 方法修改,则迭代器将跑出一个异常 ConcurrentModificationException

迭代器,直接抛出异常,干净而利落,而不是冒着任意风险,导致在未来一个不确定的时间发生一个不确定的行为;


Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.


但是需要注意的是,迭代器的快速失败行为,并不是百分百有效的,只是尽最大可能的保证,在并发的时候,可能出现修改了 HashMap 但是没有抛出异常的情况,比如在迭代的时候,在最后一次迭代的时候修改 HashMap ,就不会抛出 ConcurrentModificationException 异常 ;

因此,不能依赖抛出这个异常,然后捕捉到,以便完成某种行为,这是不可取的;迭代器只会尽可能的在修改的时候,抛出这个异常,但是不是一定抛出 ;


原文地址:https://www.cnblogs.com/young-youth/p/11665569.html