Re: [Wiki-research-l] Revert detection

2011-08-21 Thread Aaron Halfaker
I've updated my dump processing python project to include code for quickly detecting identity reverts from XML dumps. See https://bitbucket.org/halfak/wikimedia-utilities for the project and the process() function at bottom of

Re: [Wiki-research-l] Wiki-research-l Digest, Vol 72, Issue 5

2011-08-21 Thread Ed H. Chi
It's worth pointing out in our research at PARC, we had also discussed the possibility of using containment based measure as described in: On the resemblance and containment of documents, AZ Broder In the end, we realized that the real issue is that there is no universal agreement on what is a