For anyone who still cares: I'm calculating the odds of a sha256 collision in an extremely large zpool, containing 2^35 blocks of data, and no repetitions.
The formula on wikipedia for the birthday problem is: p(n;d) ~= 1-( (d-1)/d )^( 0.5*n*(n-1) ) In this case, n=2^35 d=2^256 The problem is, this formula "does not compute" because n is so large. Fortunately x = e^ln(x) and so we're able to use this technique to make the huge exponent instead, a huge multiplication. (Using the bc mathlib notation, the l(x) function is the natural log of x, and e(x) is e raised to the power of x) p(n;d) ~= 1-e( ( 0.5*n*(n-1)*l((d-1)/d) ) ) Using bc to calculate the answer: bc -l n=2^35 d=2^256 scale=1024 1-e( ( 0.5*n*(n-1)*l((d-1)/d) ) ) .0000000000000000000000000000000000000000000000000000000050978941154 I manually truncated here (precision goes out 1024 places). This is 5.1*10^-57 Note: I had to repeat the calculation many times, setting a larger and larger scale. The default scale of 20, and even 64 and 70 and 80 were not precise enough to produce a convergent answer around the -57th decimal place. So I just kept going larger, and in retrospect, anything over 100 would have been fine. I wrote 1024 above, so who cares. If you've been paying close attention you'll recognize this is the same answer I originally calculated, but my original equation was in fact wrong. It just so happens that my original equation neglects the probability of collisions from previous events, so it is actually accurate whenever the probability of previous events is insignificant. It is merely luck that for the data size in question, my equation produced something that looked correct. It would produce a wrong calculation of probability for a larger value of n. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss