Re: [Wiki-research-l] More accurate revert detection in Wikipedia, alternative to MD5 identical revision method

WereSpielChequers Wed, 27 Jun 2012 11:38:06 -0700

Hi Fabian,

That looks interesting, but I wondered if you were aware of some of the
possible results when you are editing Wikipedia articles section by section?


If an article has multiple sections then it doesn't matter how many edits
have been made to other sections, if you want to undo the most recent edit
to a particular section then you can just hit undo or rollback and revert
it. The contents of the whole article will be a new and potentially unique
revision as one section will have reverted to what it was before it was
vandalised and the other sections will be as they were before the latest
revert.

You could get some interesting examples by looking at the history of the
article on Sarah Palin on the night she became John McCain's running mate.
The edit rate peaked at 25 edits per minute, that should make it a good
example of an article where edits were only being done one section at a
time as anyone who tried to edit the whole article would have been pretty
much guaranteed an edit conflict. As I remember it there were multiple edit
wars taking place simultaneously in different sections of the article, none
would have taken the whole article back to a previous version, just one
section.

WereSpielChequers

On 27 June 2012 18:05, Floeck, Fabian (AIFB) <fabian.flo...@kit.edu> wrote:

> For those of you who are interested in reverts:
> I just presented our paper on accurate revert detection at the ACM
> Hypertext and Social Media conference 2012, showing a significant accuracy
> (and coverage) gain compared to the widely used method of finding identical
> revisions (via MD5 hash values) to detect reverts, proving that our method
> detects edit pairs that are significantly more likely to be actual reverts
> according to editors perception of a revert and the Wikipedia definition.
> 35% of the reverts found by the MD5 method in our sample are not assessed
> to be reverts by more than 80% of our survey participants (accuracy 0%).
> The provided new method finds different reverts for these 35% plus 12%
> more, which show a 70% accuracy.
>
> Find the PDF slides, paper and results here:
> http://people.aifb.kit.edu/ffl/reverts/
>
> I'll be happy to answer any questions.
>
>
> More in detail:
> The MD5 hash method employed by many researchers to identify reverts (as
> some others, like using edit  comments) is acknowledged to produce some
> inaccuracies as far as the Wikipedia definition of a revert ("reverses the
> actions of any editors", "undoing the actions"..) is concerned. The extent
> of these inaccuracies is usually judged to be not too large, as naturally,
> most reverting edits are carried out immediately after the edit to be
> reverted, being an "identity revert" (Wikipedia definition: "..*normally* 
> results
> in the page being restored to a version that existed previously"). Still,
> there has not been a user evaluation assessing how well the detected
> reverts conform with the Wikipedia definition and what users actually
> perceive as a revert. We developed and evaluated an alternative method to
> the MD5 identity revert and show a significant increase in accuracy (and
> coverage).
> 34% of the reverts detected by the MD5 hash method in our sample actually
> fail to be acknowledged as full reverts by more than 80% of users in our
> study, while our new method performs much better, finding different reverts
> for these 34% wrongly detected reverts plus 12% more reverts, showing an
> accuracy of 70% for these newly found edit pairs actually being reverts
> according to the users. The increased accuracy performance between the
> reverts detected only by the MD5 and only by our new method is highly
> significant, while reverts detected by both methods also perform
> significantly better than those only detected by the MD5 method.
>
> Trade-off:
> Although this method is much slower than the MD5 method (as it is using
> DIFFs between revisions) it reflects much better what users (and the
> Wikipedia community as a whole) see as a revert. It thereby is a valid
> alternative if you are interested in the antagonistic relationships between
> users on a more detailed and accurate level. There is quite some potential
> to make it even faster by combining the two methods, decreasing the number
> of DIFFs to be performed, let's see if we can come around doing that :)
>
> The scripts and results listed in the paper can be found at
> http://people.aifb.kit.edu/ffl/reverts/
>
> Best,
>
> Fabian
>
>
> --
> Karlsruhe Institute of Technology (KIT)
> Institute of Applied Informatics and Formal Description Methods
>
> Dipl.-Medwiss. Fabian Flöck
> Research Associate
>
> Building 11.40, Room 222
> KIT-Campus South
> D-76128 Karlsruhe
>
> Phone: +49 721 608 4 6584
> Skype: f.floeck_work
> E-Mail: fabian.flo...@kit.edu
> WWW: http://www.aifb.kit.edu/web/Fabian_Flöck
>
> KIT – University of the State of Baden-Wuerttemberg and
> National Research Center of the Helmholtz Association
>
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Re: [Wiki-research-l] More accurate revert detection in Wikipedia, alternative to MD5 identical revision method

Reply via email to