11 Hash tables

Many applications require a dynamic set that supports only the dictionary operations INSERT, SEARCH, and DELETE.

For example, a compiler that translates a programming language maintains a symbol table, in which the keys of elements are arbitrary character strings corresponding to identifiers in the language.

A hash table generalizes the simpler notion of an ordinary array. Directly ad- dressing into an ordinary array makes effective use of our ability to examine an arbitrary position in an array in O(1) time.

一个hash表是由一个简单地普通列扩展而来的。直接寻址使得一个普通的列能够检查任意位置在o（1）时间。

11.1 Direct- address tables

Direct addressing is a simple technique that works well when the universe U of keys is reasonably small.

直接寻址是一个简单地技术。当关键字的区间恰当小时，工作正常。

To represent the dynamic set, we use an array, or direct-address table, denoted by T [0.. m-1], in which each position, or slot, corresponds to a key in the uni- verse U .

11.2 Hash tables

With direct addressing, an element with key k is stored in slot k. With hashing, this element is stored in slot h.k/; that is, we use a hash function h to compute the slot from the key k.

Here ,h maps the universe U of keys into the slots of a hash table T[0.. m-1]:

h:U->{0,1,…,m-1}

We say that an element with key k hashes to slot h.k/; we also say that h.k/ is the hash value of key k.

There is one hitch: two keys may hash to the same slot. We call this situation a collision.

Collision resolution by chaining

In chaining, we place all the elements that hash to the same slot into the same linked list

The dictionary operations on a hash table T are easy to implement when collisions are resolved by chaining

11.3 Hash functions

In this section ,we discuss some issues regrading the design of good hash functions and then present three schemes for their creation . Two of the schemes ,hashing by division and hashing by multiplication ,are heuristic in nature ,whereas the third scheme ,universal hashing ,uses randommization to provide provably good performance .

11.3.1 The division method

h(k)=k mod m

11.3.2 The multiplication method

The multiplication method for creating hash functions operates in two steps. First, we multiply the key k by a constant A in the range 0 < A < 1 and extract the

fractional part of kA. Then, we multiply this value by m and take the floor of the result. In short, the hash function is

这两个函数是最简单的。

hash表最重要的就是找到一个哈希函数，让其尽量的避免或少点collision。

11.4 Open addressing

In open addressing ,all elements occupy the hash table itself . That is ,each table entry contains either an element of the dynamic set or nil.When searching for an element, we systematically examine table slots until either we find the desired element or we have ascertained that the element is not in the table. No lists and

no elements are stored outside the table, unlike in chaining.

The advantage of open addressing is that it avoids pointers altogether .

开放地址的好处是完全避免了指针。

Instead of following pointers, we compute the sequence of slots to be examined.

代替指针的是我们计算slots 的序列来检查。

The hash function becomes :

With open addressing ,we require that for every key k, the probe sequence

be a permutation of <0,1,,..m-1>

The algorithm for searching for key k probes the same sequence of slots that the insertion algorithm examined when key k was inserted.

In our analysis ,we assume uniform hashing : the probe sequence of each key is equally likely to be any of the m!permutations of <0,1,2,m-1>.

We will examine three commonly used techniques to compute the probe sequences required for open addressing : linear probing ,quadratic probing ,and double hashing .

线性探针，平方探针，双重探针。

Linear probing

Given an ordinary hash function h':u->{0,1,m-1} ,which we refer to as an auxiliary hash function ,the method of linear probing uses the hash function

Double hashing

Double hashing offers one of the best methods available for open addressing be- cause the permutations produced have many of the characteristics of randomly chosen permutations. Double hashing uses a hash function of the form