Cobi (owner of ClueBot) and his roomate Crispy have already been working hard to make this specific dataset, but they've been hurt by not enough contributors. The page is here: http://en.wikipedia.org/ wiki/User:Crispy1989#New_Dataset_Contribution_Interface
X! On Mar 19, 2009, at 8:15 AM [Mar 19, 2009 ], Tei wrote: > On Thu, Mar 19, 2009 at 1:03 PM, Delirium <delir...@hackish.org> > wrote: > >> Brian wrote: >>> This extension is very important for training machine learning >>> vandalism detection bots. Recently published systems use only >>> hundreds >>> of examples of vandalism in training - not nearly enough to >>> distinguish between the variety found in Wikipedia or generalize to >>> new, unseen forms of vandalism. A large set of human created rules >>> could be run against all previous edits in order to create a massive >>> vandalism dataset. >> As a machine-learning person, this seems like a somewhat problematic >> idea--- generating training examples *from a rule set* and then >> learning >> on them is just a very roundabout way of reconstructing that rule >> set. >> What you really want is a large dataset of human-labeled examples of >> vandalism / non-vandalism that *can't* currently be distinguished >> reliably by rules, so you can throw a machine-learning algorithm >> at the >> problem of trying to come up with some. >> > > since theres already a database, this sounds like could be done > flagging > edits as "vandalism", and then reading the existing database > information to > extract these details, like ip, a diff of the change, etc.. that > way, > humans define what is a "vandalism", and the machine can learn the > meaning. > > this may need a button or something, so users report this, and the > database > flag the edit > > > -- > -- > ℱin del ℳensaje. > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l