https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

--- Comment #12 from Aaron Halfaker <aaron.halfa...@gmail.com> 2010-02-17 
21:06:33 UTC ---
I'm not sure how the conversation got steered to recent changes monitoring
since I don't understand the usefulness of knowing when no-op edits are saved
to articles, but I'd like to bring back the discussion of how a checksum could
be used to quickly determine which revisions are reverts computationally. 
Reverts and reverting are functions of MediaWiki and the way it manages
history. 

Research suggests that most reverts in the English Wikipedia are identity
reverts[1] (reverts that can be detected by comparing checksums between
revisions).  Detecting identity reverts through the current API requires the
retrieval of the complete text of the revisions which consumes appreciable disk
access and network resources.  With the addition of checksums, this expensive
retrieval process could be forgone and writing tools that interact with
MediaWiki's API would be easier and more straightforward.

As for the possibility of collision, an MD5 checksum could contain 16^32
possible values which means that, so long as the realm of possible outputs is
uniform, the probability of a collision is about 1 in sqrt(16^32)[2] or
1:18,446,744,073,709,552,000.  

1. http://doi.acm.org/10.1145/1240624.1240698
2. http://en.wikipedia.org/wiki/Birthday_paradox

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to