布隆过滤器

首先，你得知道布隆过滤器是用来干嘛的。

然后，有个直观的感觉：

[1]http://billmill.org/bloomfilter-tutorial/

然后看看讲解：

[2]: http://pages.cs.wisc.edu/~cao/papers/summary-cache/node8.html

关于false positives的算法分析可以看看维基百科的讲解：

[3]: https://en.wikipedia.org/wiki/Bloom_filter#Probability_of_false_positives

最后看看Bloom Filters最初的那篇论文吧：

[4]: http://dl.acm.org/citation.cfm?id=362692

(其实我怀疑里面是不是有typo)

然后也可以看看各种变种：

[5]: Less Hashing, Same Performance: Building better Bloom Filter, Adam Kirsch , Michael Mitzenmacher

[6]: Compressed Bloom Filter, Michael MitzenmacherHarvard Univ., Cambridge, MA

Bloom Filter的原理简述如下：

** 首先是插入过程 **：

Bloom Filter里面有一个vector，用到多个hash function对同一个输入进行hashing，每一个hash function都会产生一个数值，这个数值可以作为index，vector[index]会被设为1。这个插入过程持续进行......

** 然后是检查一个输入是否已经存在于vector里面 **：

这个检查的过程和插入的相似。先用多个hash function对这个输入进行hashing，也会产生多个数值（作为index）。假如有某个index_1，使得vector[index_1] == 0，则这个输入肯定没有输入过；假如对于任意的index，vector[index]都为1，则这个输入有可能曾经被输入过。这个有可能的意思是，有两种可能：(1):这个输入的确被输入过；(2)这个输入没有被输入过，但是被误认为输入过了。第(2)种情况就是所谓的false positive。

既然Bloom Filter用到那么多hash function，那就列举几个常见的吧：

[*]: http://www.burtleburtle.net/bob/hash/doobs.html

[*]: murmurhash

[*]: fnv hash

[*]: Cuckcoo Hasing,

Cuckcoo Hashing Visualization

(有时间再慢慢看完....