User name policy states that "*bot*" names are reserved for bots. Thus, such a regex shouldn't be too hacky, but I cannot comment whether some non-automated cases might slip through new user patrol. I do think dumps make the 'users' table available, and I know for sure one could get a full list via the API.

As a check on this, you could check that when these usernames edit, whether or not they set the "bot" flag. -AW

--
Andrew G. West, PhD
Research Scientist
Verisign Labs - Reston, VA
Website: http://www.andrew-g-west.com


On 05/18/2014 12:10 PM, Brian Keegan wrote:
Is there a way to retrieve a canonical list of bots on enwiki or
elsewhere? I'm interested in omitting automated revisions (sorry
Stuart!) for the purposes of building co-authorship networks.

Grabbing everything under 'Category:All Wikipedia bots' excludes some
major ones like SmackBot, Cydebot, VIAFbot, Full-date unlinking bot,
etc. because these bots have changed names but the redirect is not
categorized, the account has been removed/deprecated, or a user appears
to have removed the relevant bot categories from the page.

Can anyone advise me on how to kill all the bots in my data without
having to resort to manual cleaning or hacky regex?


--
Brian C. Keegan, Ph.D.
Post-Doctoral Research Fellow, Lazer Lab
College of Social Sciences and Humanities, Northeastern University
Fellow, Institute for Quantitative Social Sciences, Harvard University
Affiliate, Berkman Center for Internet & Society, Harvard Law School

b.kee...@neu.edu <mailto:b.kee...@neu.edu>
www.brianckeegan.com <http://www.brianckeegan.com>
M: 617.803.6971
O: 617.373.7200
Skype: bckeegan


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to