Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-29 Thread Brion Vibber
On 8/28/09 2:49 PM, Robert Rohde wrote: > On Thu, Aug 27, 2009 at 9:43 PM, Brion Vibber wrote: > > >> Robert, is it possible to share the source for generating the >> revert-based stats with other folks who may be interested in pursuing >> further work on the subject? Thanks! > > Not as a complet

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-28 Thread Anthony
On Fri, Aug 28, 2009 at 3:44 PM, Lars Aronsson wrote: > We can try to find out which edits are reverts, assuming that the > previous edit was an act of vandalism. But that's a bad assumption. It gives both false positives and false negatives, and it gives a significant number of each. I gave

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-28 Thread Lars Aronsson
Anthony wrote: > Umm...you would count the number of instances of vandalism? > > Is the question how to objectively *define* "vandalism"? On one hand, we have a perception, as expressed by media (and by CEO Sue Gardner, I believe), that vandalism (especially of biographies of living people, BL

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-28 Thread Anthony
On Fri, Aug 28, 2009 at 2:17 PM, David Gerard wrote: > 2009/8/28 Robert Rohde : > > On Fri, Aug 28, 2009 at 3:55 AM, Anthony wrote: > > >> Once we have the list, anyone is free to examine it any way they want, > and > >> show their results. But we're talking about probably less than 200 > >> ins

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-28 Thread David Gerard
2009/8/28 Robert Rohde : > On Fri, Aug 28, 2009 at 3:55 AM, Anthony wrote: >> Once we have the list, anyone is free to examine it any way they want, and >> show their results.  But we're talking about probably less than 200 >> instances of vandalism here, so it'll be quite easy (and fun) to lambas

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-28 Thread Robert Rohde
On Fri, Aug 28, 2009 at 3:55 AM, Anthony wrote: > Once we have the list, anyone is free to examine it any way they want, and > show their results.  But we're talking about probably less than 200 > instances of vandalism here, so it'll be quite easy (and fun) to lambaste > anyone whose methods pro

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-28 Thread Robert Rohde
On Thu, Aug 27, 2009 at 9:43 PM, Brion Vibber wrote: > Robert, is it possible to share the source for generating the > revert-based stats with other folks who may be interested in pursuing > further work on the subject? Thanks! Not as a complete stand-alone entity. The analysis framework I thro

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-28 Thread Anthony
On Fri, Aug 28, 2009 at 10:08 AM, Thomas Dalton wrote: > 2009/8/28 Anthony : > > If you're going to do it, maybe we should work on a rough-consensus > > objective definition of "vandalism" before you release the file, > though... > > Don't we have a consensus definition already? Vandalism is bad f

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-28 Thread Thomas Dalton
2009/8/28 Anthony : > If you're going to do it, maybe we should work on a rough-consensus > objective definition of "vandalism" before you release the file, though... Don't we have a consensus definition already? Vandalism is bad faith editing. You may also want to include test edits since they ar

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-28 Thread Anthony
On Fri, Aug 28, 2009 at 12:43 AM, Brion Vibber wrote: > On 8/27/09 9:39 PM, Thomas Dalton wrote: > > 2009/8/28 Gregory Maxwell: > >> If the results of this kind of study have good agreement with > >> mechanical proxy metrics (such as machine detected vandalism) our > >> confidence in those proxie

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Brion Vibber
On 8/27/09 9:39 PM, Thomas Dalton wrote: > 2009/8/28 Gregory Maxwell: >> If the results of this kind of study have good agreement with >> mechanical proxy metrics (such as machine detected vandalism) our >> confidence in those proxies will increase, if they disagree it will >> provide an opportunit

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Anthony
On Thu, Aug 27, 2009 at 10:07 PM, Nathan wrote: > > Out of curiosity, Anthony, do you still refrain from editing Wikimedia > projects over licensing > issues? How long has it been, a year? I guess now is as good a time as any to admit it. I started editing again, without logging in, about a mon

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Nathan
On Thu, Aug 27, 2009 at 9:47 PM, Anthony wrote: > Just took a quick sample of 10 instances of vandalism to [[Ted Stevens]]. > Of those 10 instances of vandalism, either 2 or 4 would not have been > found > by the automated tool described. 2 if every edit summary containing the > word "vandalism

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Anthony
Just took a quick sample of 10 instances of vandalism to [[Ted Stevens]]. Of those 10 instances of vandalism, either 2 or 4 would not have been found by the automated tool described. 2 if every edit summary containing the word "vandalism" is counted as vandalism, and 4 if not. The former would p

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Thomas Dalton
2009/8/28 Anthony : > I suggested a better approach last time we had this thread: statistical > sampling. This research was based on a sample. What are you talking about? ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https:

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Anthony
On Thu, Aug 27, 2009 at 8:41 PM, Thomas Dalton wrote: > 2009/8/28 Anthony : > > On Thu, Aug 27, 2009 at 8:36 PM, Thomas Dalton >wrote: > > > >> 2009/8/28 Anthony : > >> >> He means what would you measure in order to draw conclusions about > the > >> >> severity of vandalism. > >> >> > >> > > >> >

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Thomas Dalton
2009/8/28 Anthony : > On Thu, Aug 27, 2009 at 8:36 PM, Thomas Dalton wrote: > >> 2009/8/28 Anthony : >> >> He means what would you measure in order to draw conclusions about the >> >> severity of vandalism. >> >> >> > >> > Umm...you would count the number of instances of vandalism? >> >> That's not

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Anthony
On Thu, Aug 27, 2009 at 8:36 PM, Thomas Dalton wrote: > 2009/8/28 Anthony : > >> He means what would you measure in order to draw conclusions about the > >> severity of vandalism. > >> > > > > Umm...you would count the number of instances of vandalism? > > That's not practical. I never said it w

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Thomas Dalton
2009/8/28 Gregory Maxwell : > This is somewhat labor intensive, but only somewhat as it doesn't take > an inordinate number of samples to produce representative results. > This should be the gold standard for this kind of measurement as it > would be much closer to what people actually want to know

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Thomas Dalton
2009/8/28 Anthony : >> He means what would you measure in order to draw conclusions about the >> severity of vandalism. >> > > Umm...you would count the number of instances of vandalism? That's not practical. That would require a person to go through article histories revision by revision, probabl

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Gregory Maxwell
On Thu, Aug 27, 2009 at 8:24 PM, Thomas Dalton wrote: > 2009/8/28 Anthony : >> On Thu, Aug 27, 2009 at 7:58 PM, Stephen Bain wrote: >>> On Fri, Aug 28, 2009 at 4:58 AM, Anthony wrote: >>> > It seems to me to be begging the question.  You don't answer the question >>> > "how bad is vandalism" by ass

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Anthony
On Thu, Aug 27, 2009 at 8:24 PM, Thomas Dalton wrote: > 2009/8/28 Anthony : > > On Thu, Aug 27, 2009 at 7:58 PM, Stephen Bain >wrote: > > > >> On Fri, Aug 28, 2009 at 4:58 AM, Anthony wrote: > >> > > >> > It seems to me to be begging the question. You don't answer the > question > >> > "how bad

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Thomas Dalton
2009/8/28 Anthony : > On Thu, Aug 27, 2009 at 7:58 PM, Stephen Bain wrote: > >> On Fri, Aug 28, 2009 at 4:58 AM, Anthony wrote: >> > >> > It seems to me to be begging the question.  You don't answer the question >> > "how bad is vandalism" by assuming that vandalism is generally reverted. >> >> Can

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Anthony
On Thu, Aug 27, 2009 at 7:58 PM, Stephen Bain wrote: > On Fri, Aug 28, 2009 at 4:58 AM, Anthony wrote: > > > > It seems to me to be begging the question. You don't answer the question > > "how bad is vandalism" by assuming that vandalism is generally reverted. > > Can you suggest a better metric

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Stephen Bain
On Fri, Aug 28, 2009 at 4:58 AM, Anthony wrote: > > It seems to me to be begging the question. You don't answer the question > "how bad is vandalism" by assuming that vandalism is generally reverted. Can you suggest a better metric then? -- Stephen Bain stephen.b...@gmail.com _

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Anthony
On Thu, Aug 27, 2009 at 3:45 PM, Chad wrote: > > /rvv?|revert(ing)?[ ]*(vandal(ism)?)?/ > > Might give you a slightly wider sample. I'll wait for Robert to release a random sample of edits he actually identified as "reverts" and/or the actual scripts and data dump he used. __

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Alex
Anthony wrote: > On Thu, Aug 27, 2009 at 2:58 PM, Anthony wrote: > >> On Thu, Aug 27, 2009 at 2:50 PM, Thomas Dalton >> wrote: >> >>> I would put money on a significant majority of reverts being >>> reverts of vandalism rather than BRD reverts, it may not be an >>> overwhelming majority, though.

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Chad
On Thu, Aug 27, 2009 at 3:33 PM, Anthony wrote: > On Thu, Aug 27, 2009 at 2:58 PM, Anthony wrote: > >> On Thu, Aug 27, 2009 at 2:50 PM, Thomas Dalton >> wrote: >> >>> I would put money on a significant majority of reverts being >>> reverts of vandalism rather than BRD reverts, it may not be an >>

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Anthony
On Thu, Aug 27, 2009 at 2:58 PM, Anthony wrote: > On Thu, Aug 27, 2009 at 2:50 PM, Thomas Dalton wrote: > >> I would put money on a significant majority of reverts being >> reverts of vandalism rather than BRD reverts, it may not be an >> overwhelming majority, though. > > > I don't know about th

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Anthony
On Thu, Aug 27, 2009 at 2:50 PM, Thomas Dalton wrote: > 2009/8/27 Anthony : > > Why do you assume that number of reverts has any correlation with amount > of > > vandalism? Has this been studied? > > It seems to be a sensible assumption, although checking it would be > wise. It seems to me to b

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Thomas Dalton
2009/8/27 Anthony : > Why do you assume that number of reverts has any correlation with amount of > vandalism?  Has this been studied? It seems to be a sensible assumption, although checking it would be wise. I would put money on a significant majority of reverts being reverts of vandalism rather

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Anthony
On Thu, Aug 27, 2009 at 2:40 PM, Robert Rohde wrote: > I've just read two different news stories on Flagged Revisions that > described vandalism as a "growing problem" for Wikipedia. > > With that in mind, I would like to highlight one specific point in the > analysis I just did. > > The frequenc

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Anthony
1:00 edit1:02 revert 1:06 revert 1:14 revert 1:30 revert 2:02 revert How many instances of "vandalism" does your program count there, and what is the mean and median time to revert? ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscri

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Robert Rohde
I've just read two different news stories on Flagged Revisions that described vandalism as a "growing problem" for Wikipedia. With that in mind, I would like to highlight one specific point in the analysis I just did. The frequency of reverts to articles -- as a fraction of total edits -- has rem

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Andrew Turvey
very interesting research - many thanks for sharing that. - "Robert Rohde" wrote: > From: "Robert Rohde" > To: "Wikimedia Foundation Mailing List" > Sent: Thursday, 27 August, 2009 17:41:29 GMT +00:00 GMT Britain, Ireland, > Portugal > Subject: [Foundation-l] Frequency of Seeing Bad V