> From: Lassi Tuura [mailto:l...@cern.ch]
> 
> bc -l <<EOF
> scale=150
> define bday(n, h) { return 1 - e(-(n^2)/(2*h)); }
> bday(2^35, 2^256)
> bday(2^35, 2^256) * 10^57
> EOF
> 
> Basically, ~5.1 * 10^-57.
> 
> Seems your number was correct, although I am not sure how you arrived at
> it.

The number was calculated correctly, unfortunately, the formula was not.  :-(
The correct formula is the one on wikipedia...

p(n;d) ~= 1-(   ((d-1)/d)^(0.5*n*(n-1))   )
Where n=2^35 and d=2^256

Unfortunately, bc isn't good at exponentiating huge numbers.  Observe:
        exponent=0.5*2^35*(2^35-1)
        exponent
        590295810341525782528
        (2^256-1)^exponent
        Runtime error (func=(main), adr=35): exponent too large in raise
Even if I chop that down to something much much much smaller ... 99999 ... then 
bc simply hangs.  It's apparently looping 99999 times which will take forever.

It's so easy to calculate when I use the wrong formula, and difficult to 
calculate when I use the right formula.   ;-)  I'm hoping somebody at work 
knows how to do this in matlab or octave or something...  Will try later today.


> It appears that number is ~ consistent with the tables on Wikipedia article
> on birthday attack (http://en.wikipedia.org/wiki/Birthday_attack): p=10^−15

This is why, at first, I thought my formula (which is wrong) was equal to the 
formula on wikipedia.  Because at least roughly speaking, they were coming up 
with believably similar numbers.  But you can clearly see mine is wrong, for 
example, if you set n to something larger than the square root of d.  According 
to my formula, if you choose 2^160 hashes (for example) out of the space of 
2^256 hashes then you're 100% certain to have a collision, which is blatantly 
wrong.  An easier error to see is ... if you choose 4 out of a space of 10, 
then the probability of collision is 1.0.  Which again, is clearly wrong.


> For anecdotal evidence to the contrary, we once had apps using GUIDs. We
> did
> believe collisions would be extremely improbable. Yet GUID conflicts turned
> into recurring events, at worst several collisions a week. It may or may not
> be related - in our case it turned out the GUIDs weren't anywhere near as
> random as people thought they would or should be. In retrospect we clearly
> used a flawed GUID generator which rendered our assumptions about
> randomness
> invalid. SHA256 on (even non-random) data is different, but do be careful
> about what you assume.

Whenever I hear anyone saying they have experienced collisions, just as your 
GUID problem, I question what precisely was colliding, and the answer is never 
any cryptographic hash.  It's either poor randomness, or a weak 
non-cryptographic hash, or something like that.  Just as ... earlier in this 
thread I believe someone said they experienced rsync collisions...  Which I 
consider to be irrelevant to the discussion of sha256.

I certainly acknowledge that sha256 is not random, and yet it is fundamental to 
assume it's random, or more accurately, unpredictable and evenly distributed.  
But since even the most dedicated specialized mathmaticians thus far haven't 
been able to show anything to the contrary (at least not in public) the 
conclusion I'm drawing is:  I trust the distribution characteristics as long as 
I don't have malicious users or friendly users intentionally trying to 
specifically generate collisions specifically for sha256.  Someday sha256 might 
become at least partially broken, but until that time, I trust it.  I'd bet 
anything short of my life on it preventing accidental collisions...  In fact, 
every day I risk my life on things that are more likely to kill me, so I guess 
I'd even bet my life on it, if I had anything to gain.  Give me $1 and I'll bet 
my life on sha256 not generating an accidental collision.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to