js 实现数据结构 -- 散列(HashTable)

原文：

概念：

　　HashTable 类，也叫 HashMap 类，是 Dictionary 类的一种散列表实现方式。

　　散列算法的作用是尽可能快地在数据结构中定位到某个值，如之前的一些数据结构中（说的就是 LinkedList），需要遍历数据结构才能得到，如果使用散列函数，就能知道值的具体位置，因此就能快速检索到该值，单列函数的作用是给定一个键值(位置)，然后返回值在表中的地址。

　　例如下例，我们通过使用 'lose lose' 散列函数(简单将每个键值中的每个字母 ASCII 值相加)，获得一个散列表：

　　使用 js 实现如下：

class HashTable {
  constructor() {
    this.table = []; // 数组形式存储
  }

  // 散列运算函数，可自定义
  // 此处时最常见的散列函数 ‘lose lose’
  static loseloseHashCode(key) {
    let hash = 0;
    for (let codePoint 
       key) {
      hash += codePoint.charCodeAt();
    }
    return hash % 37;
  }

  // 修改和增加元素
  put(key, value) {
    const position = HashTable.loseloseHashCode(key);
    console.log(`${position} - ${key}`);
    this.table[position] = value;
  }

  get(key) {
    return this.table[HashTable.loseloseHashCode(key)];
  }

  remove(key) {
    this.table[HashTable.loseloseHashCode(key)] = undefined;
  }
}

const hash = new HashTable();
hash.put('Surmon', 'surmon.me@email.com') // 15 - Surmon
hash.put('Jhon', 'Jhonsnow@email.com') // 29 - Jhon
hash.put('Tyrion', 'Tyrion@email.com') // 16 - Tyrion

console.log(hash.get('Surmon')); // surmon.me@email.com
console.log(hash.get('Loiane')); // undefined
console.log(hash)

除散列表之外，还有散列映射和散列几何等散列结构。散列映射与散列表是一样的，而散列集合在插入、移除和获取元素的时候不再添加键值对，而是使用散列函数来代替 key。和集合相似，散列集合只存储唯一的不重复的值。

处理冲突：

　　散列表中有个头疼的问题是当一些键具有相同的散列值的时候，该如何处理这些冲突？比如：

const hash = new HashTable()
hash.put('Gandalf',    'gandalf@email.com')
hash.put('John', 'johnsnow®email.com')
hash.put('Tyrion', 'tyrion@email.com')
hash.put('Aaron',    'aaronOemail.com')
hash.put('Donnie', 'donnie@email.com')
hash.put('Ana', 'ana©email.com')
hash.put('Jonathan', 'jonathan@email.com')    
hash.put('Jamie', 'jamie@email.com')
hash.put('Sue',    'sueOemail.com')
hash.put('Mindy', 'mindy@email.com')
hash.put('Paul', 'paul©email.com')
hash.put('Nathan', 'nathan@email.com')

　　栗子中，Tyrion 和 Aaron 有相同的散列值（16)，Donnie 和 Ana 有相同的散列值（13)，Jonathan、Jamie 和 Sue 有相同的散列值（5), Mindy 和 Paul 也有相同的散列值（32)，导致最终的数据对象中，只有最后一次被添加/修改的数据会覆盖原本数据，进而生效。

　　处理冲突的几种方法：分离链接、线性查探和双散列法。下面是前两种方法：

分离链接：

　　分离链接的实质是在散列表的每一个位置创建一个链表并将元素存储，是解决冲入的最简单的方法，但是会在 HashTable 实例之外创建额外的存储空间。其示意图如下：

　　为了实现上述的分离链接，我们只需引入 LinkedList 类并修改 put、get 和 remove 这三个方法即可：

put(key, value) {
    const position = HashTable.loseloseHashCode(key)
    if (this.table[position] === undefined) {
        this.table[position] = new LinkedList()
    }
    this.table[position].append({ key, value })
}

get(key) {
    const position = HashTable.loseloseHashCode(key)
    if (this.table[position] === undefined) return undefined
    const getElementValue = node => {
        if (!node && !node.element) return undefined
        if (Object.is(node.element.key, key)) {
            return node.element.value
        } else {
            return getElementValue(node.next)
        }
    }
    return getElementValue(this.table[position].head)
}

remove(key) {
    const position = HashTable.loseloseHashCode(key)
    if (this.table[position] === undefined) return undefined
    const getElementValue = node => {
        if (!node && !node.element) return false
        if (Object.is(node.element.key, key)) {
            this.table[position].remove(node.element)
            if (this.table[position].isEmpty) {
                this.table[position] = undefined
            }
            return true
        } else {
            return getElementValue(node.next)
        }
    }
    return getElementValue(this.table[position].head)
}

线性查探

　　线性查探的基本思路是，在插入元素时，如果索引为 index 的位置已经被占据，就尝试 index + 1 的位置，如果 index + 1 的位置也被占据，则尝试 index + 2 的位置，以此类推。如下图：

　　同样，线性查探只需要修改 put、get、remove 方法，不需要使用额外空间：

put(key, value) {
    const position = HashTable.loseloseHashCode(key)
    if (this.table[position] === undefined) {
        this.table[position] = { key, value }
    } else {
        let index = ++position
        while (this.table[index] !== undefined) {
            index++
        }
        this.table[index] = { key, value }
    }
    this.table[position].append({ key, value }) // 这句没理解什么意思
}

get(key) {
    const position = HashTable.loseloseHashCode(key)
    const getElementValue = index => {
        if (this.table[index] === undefined) return undefined
        if (Object.is(this.table[index].key, key)) { // 再根据 key 值是否相等判断是否是需要的元素
            return this.table[index].value
        } else {
            return getElementValue(index + 1)
        }
    }
    return getElementValue(position)
}

remove(key) {
    const position = HashTable.loseloseHashCode(key)
    const removeElementValue = index => {
        if (this.table[index] === undefined) return false
        if (Object.is(this.table[index].key, key)) {
            this.table[index] = undefined
            return true
        } else {
            return removeElementValue(index + 1)
        }
    }
    return removeElementValue(position)
}

　　从上例中我们可以看出，优化 HashTable 的要点在于散列函数。'lose lose' 散列函数并不是一个表现良好的散列函数，而网上也有其他的散列函数：djb2、sdbm.... 或者页可以实现自己的散列函数。下面是吸能良好的 djb2 函数：

static djb2HashCode(key) { 
    let hash = 5381
    for (let codePoint of key) {
        hash = hash * 33 + codePoint.charCodeAt()
    }
    return hash % 1013
}