Cobi (owner of ClueBot) and his roomate Crispy have already been  
working hard to make this specific dataset, but they've been hurt by  
not enough contributors. The page is here: http://en.wikipedia.org/ 
wiki/User:Crispy1989#New_Dataset_Contribution_Interface

X!

On Mar 19, 2009, at 8:15 AM [Mar 19, 2009 ], Tei wrote:

> On Thu, Mar 19, 2009 at 1:03 PM, Delirium <delir...@hackish.org>  
> wrote:
>
>> Brian wrote:
>>> This extension is very important for training  machine learning
>>> vandalism detection bots. Recently published systems use only  
>>> hundreds
>>> of examples of vandalism in training - not nearly enough to
>>> distinguish between the variety found in Wikipedia or generalize to
>>> new, unseen forms of vandalism. A large set of human created rules
>>> could be run against all previous edits in order to create a massive
>>> vandalism dataset.
>> As a machine-learning person, this seems like a somewhat problematic
>> idea--- generating training examples *from a rule set* and then  
>> learning
>> on them is just a very roundabout way of reconstructing that rule  
>> set.
>> What you really want is a large dataset of human-labeled examples of
>> vandalism / non-vandalism that *can't* currently be distinguished
>> reliably by rules, so you can throw a machine-learning algorithm  
>> at the
>> problem of trying to come up with some.
>>
>
> since theres already a database, this sounds like could be done  
> flagging
> edits as "vandalism", and then reading the existing database  
> information to
> extract these details, like ip,  a diff of the change, etc..   that  
> way,
> humans define what is a "vandalism", and the machine can learn the  
> meaning.
>
> this may need a button or something, so users report this, and the  
> database
> flag the edit
>
>
> -- 
> --
> ℱin del ℳensaje.
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to