Thanks for all the references and excellent advice so far!

I've looked into the Hale Anti-Bot Methodâ„¢, but because I've sampled my
corpus on articles (based on category co-membership), the resulting groupby
users gives these semi-automated users more "normal" distributions since
their other contributions are censored. In other words, I see only a
fraction of these users' contributions and thus the resulting time
intervals I observe are spaced farther apart (more typical) than they
actually are. It's not feasible for me to get 100k+ users' histories just
for the purposes of cleaning up ~6k articles' histories.

Another thought I had was that because many semi-automated tools such as
Twinkle and AWB leave parenthetical annotations in their revision comments,
would this be a relatively inexpensive way to filter out revisions rather
than users? Some caveats, I'd like to get domain experts' feedback on. I'm
not expecting settled research, just input from others' experiences munging
the data.

1. Is the inclusion of this markup in revision comments optional? This is a
concern that some users may enable or disable it, so I may end up biasing
inclusion based on users' preferences.
2. How have these flags or markup changed over time? This is a concern that
Twinke/AWB/etc. may have started/stopped including flags or changed what
they included over time.
3. Are there other API queries or data elsewhere I could use to identify
(semi-)automated revisions?


On Mon, May 19, 2014 at 10:35 AM, Federico Leva (Nemo)
<nemow...@gmail.com>wrote:

> Brian Keegan, 18/05/2014 18:10:
>
>  Is there a way to retrieve a canonical list of bots on enwiki or
>> elsewhere?
>>
>
> A Bots.csv list exists. https://meta.wikimedia.org/wiki/Wikistat_csv
> In general: please edit https://meta.wikimedia.org/
> wiki/Research:Identifying_bot_accounts
>
> Nemo
>
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



-- 
Brian C. Keegan, Ph.D.
Post-Doctoral Research Fellow, Lazer Lab
College of Social Sciences and Humanities, Northeastern University
Fellow, Institute for Quantitative Social Sciences, Harvard University
Affiliate, Berkman Center for Internet & Society, Harvard Law School

b.kee...@neu.edu
www.brianckeegan.com
M: 617.803.6971
O: 617.373.7200
Skype: bckeegan
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to