pigeonhole principle 哈希表的重复问题(冲突)是不可避免的

https://en.wikipedia.org/wiki/Pigeonhole_principle

Sock-picking

Assume a drawer contains a mixture of black socks and blue socks, each of which can be worn on either foot, and that you are pulling a number of socks from the drawer without looking. What is the minimum number of pulled socks required to guarantee a pair of the same color? Using the pigeonhole principle, to have at least one pair of the same color (m = 2 holes, one per color) using one pigeonhole per color, you need to pull only three socks from the drawer (n = 3 items). Either you have three of one color, or, exclusively, two of one color and one of the other.

Hand-shaking

If there are n people who can shake hands with one another (where n > 1), the pigeonhole principle shows that there is always a pair of people who will shake hands with the same number of people. As the 'holes', or m, correspond to number of hands shaken, and each person can shake hands with anybody from 0 to n − 1 other people, this creates n − 1 possible holes. This is because either the '0' or the 'n − 1' hole must be empty (if one person shakes hands with everybody, it's not possible to have another person who shakes hands with nobody; likewise, if one person shakes hands with no one there cannot be a person who shakes hands with everybody). This leaves n people to be placed in at most n − 1 non-empty holes, guaranteeing duplication.

Hair-counting

We can demonstrate there must be at least two people in London with the same number of hairs on their heads as follows.[4] Since a typical human head has an average of around 150,000 hairs, it is reasonable to assume (as an upper bound) that no one has more than 1,000,000 hairs on their head (m = 1 million holes). There are more than 1,000,000 people in London (n is bigger than 1 million items). Assigning a pigeonhole to each number of hairs on a person's head, and assign people to pigeonholes according to the number of hairs on their head, there must be at least two people assigned to the same pigeonhole by the 1,000,001st assignment (because they have the same number of hairs on their heads) (or, n > m). For the average case (m = 150,000) with the constraint: fewest overlaps, there will be at most one person assigned to every pigeonhole and the 150,001st person assigned to the same pigeonhole as someone else. In the absence of this constraint, there may be empty pigeonholes because the "collision" happens before we get to the 150,001st person. The principle just proves the existence of an overlap; it says nothing of the number of overlaps (which falls under the subject of probability distribution).

There is a passing, satirical, allusion in English to this version of the principle in A History of the Athenian Society, prefixed to ""A Supplement to the Athenian Oracle: Being a Collection of the Remaining Questions and Answers in the Old Athenian Mercuries"", (Printed for Andrew Bell, London, 1710).[5] It seems that the question whether there were any two persons in the World that have an equal number of hairs on their head? had been raised in The Athenian Mercury before 1704.[6][7]

Perhaps the first written reference to the pigeonhole principle appears in 1622 in a short sentence of the Latin work Selectæ Propositiones, by the French Jesuit Jean Leurechon,[8] where he wrote "It is necessary that two men have the same number of hairs, écus, or other things, as each other."[9]

The birthday problem

The birthday problem asks, for a set of n randomly chosen people, what is the probability that some pair of them will have the same birthday? By the pigeonhole principle, if there are 367 people in the room, we know that there is at least one pair who share the same birthday, as there are only 366 possible birthdays to choose from (including February 29, if present). The birthday "paradox" refers to the result that even if the group is as small as 23 individuals, there will still be a pair of people with the same birthday with a 50% probability. While at first glance this may seem surprising, it intuitively makes sense when considering that a comparison will actually be made between every possible pair of people rather than fixing one individual and comparing them solely to the rest of the group.

https://zh.wikipedia.org/wiki/鸽巢原理

虽然鸽巢原理看起来很容易理解,但有时使用鸽巢原理会得到一些有趣的结论:

  • 比如:北京至少有两个人头发数一样多。
    • 证明:常人的头发数在15万左右,可以假定没有人有超过100万根头发,但北京人口大于100万。如果我们让每一个鸽巢对应一个头发数字,鸽子对应于人,那就变成了有大于100万只鸽子要进到至多100万个巢中。所以,可以得到“北京至少有两个人头发数一样多”的结论。

另一个例子:

  • 盒子里有10只黑袜子、12只蓝袜子,你需要拿一对同色的出来。假设你总共只能拿一次,只要3只就无法回避会拿到至少两只相同颜色的袜子,因为颜色只有两种(鸽巢只有两个),而有三只袜子(三只鸽子),从而得到“拿3只袜子出来,就能保证有一双同色”的结论。

更不直观一点的例子:

  • 有n个人(至少2人)互相握手(随意找人握),必有两人握过手的人数相同。
    • 这里,鸽巢对应于握过手人数,鸽子对应于人,每个人都可以与[0,n-1]人握过手(但0和n-1不能同时存在,因为如果一个人不和任何人握手,那就不会存在一个和所有其他人都握过手的人),所以鸽巢是n-1个。但有n个人(n只鸽子),因此证明了命题正确。

鸽巢原理经常在计算机领域得到真正的应用。比如:哈希表的重复问题(冲突)是不可避免的,因为Keys的数目总是比Indices的数目多,不管是多么高明的算法都不可能解决这个问题。这个原理,还证明任何无损压缩算法,在把一个文件变小的同时,一定有其他文件增大来辅助,否则某些信息必然会丢失。

原文地址:https://www.cnblogs.com/rsapaper/p/6740743.html