It is meaningless to talk about cryptography without a threat model, just as 
Robert says. Is anybody actually attacking us? Or are we worried about 
accidental collisions?

Sent from my Verizon Wireless Phone


-----Original message-----
From: Robert Rohde <raro...@gmail.com>
To: Wikimedia developers <wikitech-l@lists.wikimedia.org>
Sent: Sun, Sep 18, 2011 05:56:15 GMT+00:00
Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table 
(discussing r94289)

On Sat, Sep 17, 2011 at 4:56 PM, Anthony <wikim...@inbox.org> wrote:
> On Sat, Sep 17, 2011 at 6:46 PM, Robert Rohde <raro...@gmail.com> wrote:
>> Is there a good reason to prefer SHA-1?
>>
>> Both have weaknesses allowing one to construct a collision (with
>> considerable effort)
>
> Considerable effort?  I can create an MD5 collision in a few minutes
> on my home computer.  Is there anything even remotely like this for
> SHA-1?

If I've been keeping up to date, the collision complexity for MD5 is
about 2^21 operations, and runs in a few seconds (not minutes); and
for SHA-1 down to about 2^52 with current results.  The latter
represents about 100 cpu-years, which is within the realm of
supercomputers.  That time will probably continue to come down if
people find ways to improve the attacks on SHA-1.  (The existing
attacks usually require the ability to feed arbitrary binary strings
into the hash function.  Given that both browsers and Mediawiki will
tend to reject binary data placed in an edit window, I'm not sure if
any of the existing attacks could be reliably applied to Mediawiki
editing.)

If collision attacks really matter we should use SHA-1.  However, do
any of the proposed use cases care about whether someone might
intentionally inject a collision?  In the proposed uses I've looked at
it, it seems irrelevant.  The intentional collision will get flagged
as a revert and the text leading to that collision would be discarded.
 How is that a bad thing?

It's a not a big deal, but if I understand prior comments correctly,
most of the existing offline infrastructure uses MD5, so I'm wondering
if there is a distinct use case for favoring SHA-1.

>> MD5 is shorter and in my experience about 25% faster to compute.
>>
>> Personally I've tended to view MD5 as more than good enough in offline 
>> analyses.
>
> For offline analyses, there's no need to change the online database tables.

Need?  That's debatable, but one of the major motivators is the desire
to have hash values in database dumps (both for revert checks and for
checksums on correct data import / export).  Both of those are
"offline" uses, but it is beneficial to have that information
precomputed and stored rather than frequently regenerated.

-Robert Rohde

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to