> From: Lassi Tuura [mailto:l...@cern.ch] > > bc -l <<EOF > scale=150 > define bday(n, h) { return 1 - e(-(n^2)/(2*h)); } > bday(2^35, 2^256) > bday(2^35, 2^256) * 10^57 > EOF > > Basically, ~5.1 * 10^-57. > > Seems your number was correct, although I am not sure how you arrived at > it.
The number was calculated correctly, unfortunately, the formula was not. :-( The correct formula is the one on wikipedia... p(n;d) ~= 1-( ((d-1)/d)^(0.5*n*(n-1)) ) Where n=2^35 and d=2^256 Unfortunately, bc isn't good at exponentiating huge numbers. Observe: exponent=0.5*2^35*(2^35-1) exponent 590295810341525782528 (2^256-1)^exponent Runtime error (func=(main), adr=35): exponent too large in raise Even if I chop that down to something much much much smaller ... 99999 ... then bc simply hangs. It's apparently looping 99999 times which will take forever. It's so easy to calculate when I use the wrong formula, and difficult to calculate when I use the right formula. ;-) I'm hoping somebody at work knows how to do this in matlab or octave or something... Will try later today. > It appears that number is ~ consistent with the tables on Wikipedia article > on birthday attack (http://en.wikipedia.org/wiki/Birthday_attack): p=10^−15 This is why, at first, I thought my formula (which is wrong) was equal to the formula on wikipedia. Because at least roughly speaking, they were coming up with believably similar numbers. But you can clearly see mine is wrong, for example, if you set n to something larger than the square root of d. According to my formula, if you choose 2^160 hashes (for example) out of the space of 2^256 hashes then you're 100% certain to have a collision, which is blatantly wrong. An easier error to see is ... if you choose 4 out of a space of 10, then the probability of collision is 1.0. Which again, is clearly wrong. > For anecdotal evidence to the contrary, we once had apps using GUIDs. We > did > believe collisions would be extremely improbable. Yet GUID conflicts turned > into recurring events, at worst several collisions a week. It may or may not > be related - in our case it turned out the GUIDs weren't anywhere near as > random as people thought they would or should be. In retrospect we clearly > used a flawed GUID generator which rendered our assumptions about > randomness > invalid. SHA256 on (even non-random) data is different, but do be careful > about what you assume. Whenever I hear anyone saying they have experienced collisions, just as your GUID problem, I question what precisely was colliding, and the answer is never any cryptographic hash. It's either poor randomness, or a weak non-cryptographic hash, or something like that. Just as ... earlier in this thread I believe someone said they experienced rsync collisions... Which I consider to be irrelevant to the discussion of sha256. I certainly acknowledge that sha256 is not random, and yet it is fundamental to assume it's random, or more accurately, unpredictable and evenly distributed. But since even the most dedicated specialized mathmaticians thus far haven't been able to show anything to the contrary (at least not in public) the conclusion I'm drawing is: I trust the distribution characteristics as long as I don't have malicious users or friendly users intentionally trying to specifically generate collisions specifically for sha256. Someday sha256 might become at least partially broken, but until that time, I trust it. I'd bet anything short of my life on it preventing accidental collisions... In fact, every day I risk my life on things that are more likely to kill me, so I guess I'd even bet my life on it, if I had anything to gain. Give me $1 and I'll bet my life on sha256 not generating an accidental collision. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss