xgqfrms™, xgqfrms® : xgqfrms's offical website of GitHub!

高级数据结构之 BloomFilter

布隆过滤器

https://en.wikipedia.org/wiki/Bloom_filter

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set.
False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set".
Elements can be added to the set, but not removed (though this can be addressed with the counting Bloom filter variant); the more items added, the larger the probability of false positives.

布隆过滤器是一种节省空间的概率数据结构,由伯顿·霍华德·布鲁姆(Burton Howard Bloom)在1970年提出,用于测试元素是否为集合的成员。
可能会出现假阳性匹配,但否定否定匹配-换句话说,查询返回“可能在集合中”或“绝对不在集合中”。
元素可以添加到集合中,但不能删除(尽管可以通过计数Bloom过滤器变体解决); 添加的项目越多,误报的可能性就越大。

BloomFilter

data structure

hash function

https://www.geeksforgeeks.org/bloom-filters-introduction-and-python-implementation/

https://blog.cloudflare.com/when-bloom-filters-dont-bloom/

refs

BloomFilter & python crawler

https://github.com/cpselvis/zhihu-crawler/blob/master/crawler.py#L33

如何计算算法的复杂度

https://time.geekbang.org/course/detail/100019701-41531

const n = 10**6;

console.time(`for`)
for (let i = 1; i <= n; i++) {
  if (i === n) {
    console.log(`ok`)
  }
}
console.timeEnd(`for`)
// ok
// for: 4.4267578125 ms

console.time(`math`)
const result = (n * (n + 1))/2
console.log(`ok`, result)
console.timeEnd(`math`)

// ok 500000500000
// math: 0.09912109375 ms



Flag Counter

©xgqfrms 2012-2020

www.cnblogs.com 发布文章使用:只允许注册用户才可以访问!


原文地址:https://www.cnblogs.com/xgqfrms/p/13490357.html