《Cracking the Coding Interview》——第10章:可扩展性和存储空间限制——题目6

2014-04-24 22:01

题目:你有10亿条url,怎么检测其中时候有重复呢?

解法:Hash,算签名,然后用K-V数据库保存数据查重。

代码:

1 // 10.6 You have 10 billion URLs, how would you do to detect duplicates in them.
2 // Answer:
3 //    1. Use digital sign algorithm to convert string to a number of checksum.
4 //    2. Use this sign as the hash key, if memory allow, use an in-memory hash table to detect duplicates.
5 //    3. If memory won't fit in, use K-V database instead. 10GB scale should be acceptable for one machine, so I won't seek help from another computer.
6 int main()
7 {
8     return 0;
9 }
原文地址:https://www.cnblogs.com/zhuli19901106/p/3687456.html