1,273
edits
Changes
→A tutorial on collision probability: formatting
are good.
If you have <code>N </code> buckets and a good (pseudorandom) hash function, the probabilityof any two keys colliding is <code>1/N</code>. So when you have <code>i </code> keys in the table andinsert key <code>i + 1</code>, the probability that the new key does NOT '''not''' collide with anyold key is <code>(1 - 1/N)^<sup>i</sup></code>. If you insert <code>n </code> distinct items, the expected numberthat WON'T ''won't''' collide is
so the expected number of collisions is<code>n - N + N (1 - 1/N)<sup>n</sup></code>.
if the number of collisions you're getting is in the ballpark of what you
should expect to get. For example, if you have <code>N </code> = 100 buckets and <code>n </code> = 100
keys, expect about 36.6 collisions.