11 Hash tables

11 Hash tables 

 

Many applications require a dynamic set that supports only the dictionary operations INSERT, SEARCH, and DELETE.

For example, a compiler that translates a programming language maintains a symbol table, in which the keys of elements are arbitrary character strings corresponding to identifiers in the language.

 

A hash table generalizes the simpler notion of an ordinary array. Directly ad- dressing into an ordinary array makes effective use of our ability to examine an arbitrary position in an array in O(1) time. 

 

一个hash表是由一个简单地普通列扩展而来的。 直接寻址 使得一个普通的列 能够 检查任意位置在o1)时间。

 

11.1 Direct- address tables 

Direct addressing is a simple technique that works well when the universe U of keys is reasonably small.

直接寻址是一个简单地技术。 关键字 的区间 恰当小时,工作正常。

 

 

To represent the dynamic set, we use an array, or direct-address table, denoted by T [0.. m-1], in which each position, or slot, corresponds to a key in the uni- verse U .

 

11.2 Hash tables 

 

With direct addressing, an element with key k is stored in slot k. With hashing, this element is stored in slot h.k/; that is, we use a hash function h to compute the slot from the key k.

Here ,h maps the universe U of keys into the slots of a hash table T[0.. m-1]:

h:U->{0,1,…,m-1}

 

We say that an element with key k hashes to slot h.k/; we also say that h.k/ is the hash value of key k.

 

 

 

There is one hitch: two keys may hash to the same slot. We call this situation a collision.

 

Collision resolution by chaining

In chaining, we place all the elements that hash to the same slot into the same linked list

 

The dictionary operations on a hash table T are easy to implement when collisions are resolved by chaining 

 

11.3 Hash functions 

In this section ,we discuss some issues regrading the design of good hash functions and then  present three schemes for their creation . Two of the schemes ,hashing by division and hashing by multiplication ,are heuristic in nature ,whereas  the third scheme ,universal hashing ,uses randommization to provide provably good performance .

11.3.1 The division method

h(k)=k mod m 

11.3.2 The multiplication method 

The multiplication method for creating hash functions operates in two steps. First, we multiply the key k by a constant A in the range 0 < A < 1 and extract the

fractional part of kA. Then, we multiply this value by m and take the floor of the result. In short, the hash function is

这两个函数是最简单的。

hash表最重要的就是找到一个哈希函数,让其尽量的避免或少点collision

11.4 Open addressing 

 

In open addressing ,all elements occupy the hash table itself . That is ,each table entry contains either an element of the dynamic set or nil.When searching for an element, we systematically examine table slots until either we find the desired element or we have ascertained that the element is not in the table. No lists and

no elements are stored outside the table, unlike in chaining.

 

The advantage of open addressing  is that  it avoids pointers altogether . 

开放地址的好处是完全避免了指针。

 

Instead of following pointers, we compute the sequence of slots to be examined.

 

代替指针的是我们计算slots 的序列来检查。

 

The hash function becomes :

With open addressing ,we require that for every key k, the probe sequence 

be a permutation of <0,1,,..m-1>

 

The algorithm for searching for key k probes the same sequence of slots that the insertion algorithm examined when key k was inserted.

 

 

 

In our analysis ,we assume uniform hashing : the probe sequence of each key is equally likely to be any of the m!permutations of <0,1,2,m-1>.

We will examine three commonly used techniques to compute the probe sequences required for open addressing : linear probing ,quadratic probing ,and double hashing .

线性探针,平方探针,双重探针。

 

Linear probing 

Given an ordinary hash function h':u->{0,1,m-1} ,which we refer to as an auxiliary hash function ,the method of linear probing uses the hash function 

 

 

Double hashing 

Double hashing offers one of the best methods available for open addressing be- cause the permutations produced have many of the characteristics of randomly chosen permutations. Double hashing uses a hash function of the form

 

 

 

 

 

 

 

原文地址:https://www.cnblogs.com/ljlkfx/p/4470458.html