Re: [Wiki-research-l] Estimate of vandal population

2013-10-01 Thread Dmitry Chichkov
I think a rough analysis user / IP talk pages could give you a number
pretty quickly. You probably would want to do it by hand first and then
write a script that analyses the wikipedia dump file. It is doable by hand,
if you just sub-sample a few hundred pages randomly. And if normalized by a
total number of user pages vs total number of users this would already give
a rough estimate.

Kind Regards,
Dmitry


On Tue, Oct 1, 2013 at 11:00 AM, Ziko van Dijk  wrote:

> So Piotr, if I understand you well it is about the question how many of
> the people who are our "contributors" according to the statistics (per 5
> edits a month, or 100 edits a month) are actually vandals? I could imagine
> that some vandals manage to make 5 edits before being blocked, or lose
> interest before they are blocked, and appear in the statistics.
> Kind regards
> Ziko
>
>
> 2013/9/29 Piotr Konieczny 
>
>>  I know of the categories, but the problem is that they do not seem to
>> be comprehensive. I can estimate, based on them, that there are at least
>> 150k or so editors who were banned for vandalism, but it seems many vandals
>> do not make it into those categories, suggesting this number is
>> underestimated.
>>
>> Still, we should be able to get some estimates. We know, for example,
>> that  something like 5 or 6 million of accounts have made 1+ edit on
>> English Wikipedia. How many of them were indefinitely blocked? This should
>> give us some idea.
>>
>> Alternatively, we know how many accounts make an edit to Wikipedia every
>> given timeframe. About 100,000-120,000 editors make at least one edit to
>> Wikipedia each month. If we knew how many are indef blocked in that period,
>> that would be another useful estimate.
>>
>>
>> --
>> Piotr Konieczny, 
>> PhDhttp://hanyang.academia.edu/PiotrKoniecznyhttp://scholar.google.com/citations?user=gdV8_AEJhttp://en.wikipedia.org/wiki/User:Piotrus
>>
>>
>>
>> On 9/30/2013 11:44 AM, Stuart Yeates wrote:
>>
>> I guess it depends on whether Piotr is looking for an estimate of
>> accounts used for vandalism or an estimate of the people who operate them.
>> One seems straight forward, the other more challenging. Perhaps combining
>> the categories below with sock puppet investigations and some fancy stats?
>>
>>  Cheers
>> Stuart
>>
>> On 29/09/2013, at 12:13 am, h  wrote:
>>
>>   Hello Piotr,
>>I believe that in Chinese Wikipedia, "blocked indefinitely" is a user
>> category called Wikipedians that are blocked indefinitely "被永久封禁的維基人"
>> http://zh.wikipedia.org/wiki/Category:%E8%A2%AB%E6%B0%B8%E4%B9%85%E5%B0%81%E7%A6%81%E7%9A%84%E7%B6%AD%E5%9F%BA%E4%BA%BA
>>Its equivalent Wikidata table has the following pages in other
>> language versions:
>> http://www.wikidata.org/wiki/Q4616402#sitelinks-wikipedia
>>  Language Code Linked article
>>English enwiki Category:Blocked historical 
>> users
>>   italiano itwiki Categoria:Wikipedia:Cloni 
>> sospetti
>>   latviešu lvwiki Kategorija:Uz nenoteiktu laiku nobloķētie 
>> lietotāji
>>   slovenčina skwiki Kategória:Wikipédia:Natrvalo zablokovaní 
>> používatelia
>>   česky cswiki Kategorie:Wikipedie:Natrvalo zablokovaní 
>> uživatelé
>>   български bgwiki Категория:Блокирани неприемливи потребителски 
>> имена
>>   олык марий mhrwiki Категорий:Википедий:Йӧн 
>> петырыме
>>   українська ukwiki Категорія:Безстроково заблоковані 
>> користувачі
>>   中文 zhwiki 
>> Category:被永久封禁的維基人
>>   日本語 jawiki Category:無期限ブロックを受けたユー 
>> ザー

Re: [Wiki-research-l] Estimate of vandal population

2013-10-01 Thread Ziko van Dijk
So Piotr, if I understand you well it is about the question how many of the
people who are our "contributors" according to the statistics (per 5 edits
a month, or 100 edits a month) are actually vandals? I could imagine that
some vandals manage to make 5 edits before being blocked, or lose interest
before they are blocked, and appear in the statistics.
Kind regards
Ziko


2013/9/29 Piotr Konieczny 

>  I know of the categories, but the problem is that they do not seem to be
> comprehensive. I can estimate, based on them, that there are at least 150k
> or so editors who were banned for vandalism, but it seems many vandals do
> not make it into those categories, suggesting this number is underestimated.
>
> Still, we should be able to get some estimates. We know, for example,
> that  something like 5 or 6 million of accounts have made 1+ edit on
> English Wikipedia. How many of them were indefinitely blocked? This should
> give us some idea.
>
> Alternatively, we know how many accounts make an edit to Wikipedia every
> given timeframe. About 100,000-120,000 editors make at least one edit to
> Wikipedia each month. If we knew how many are indef blocked in that period,
> that would be another useful estimate.
>
>
> --
> Piotr Konieczny, 
> PhDhttp://hanyang.academia.edu/PiotrKoniecznyhttp://scholar.google.com/citations?user=gdV8_AEJhttp://en.wikipedia.org/wiki/User:Piotrus
>
>
>
> On 9/30/2013 11:44 AM, Stuart Yeates wrote:
>
> I guess it depends on whether Piotr is looking for an estimate of accounts
> used for vandalism or an estimate of the people who operate them. One seems
> straight forward, the other more challenging. Perhaps combining the
> categories below with sock puppet investigations and some fancy stats?
>
>  Cheers
> Stuart
>
> On 29/09/2013, at 12:13 am, h  wrote:
>
>   Hello Piotr,
>I believe that in Chinese Wikipedia, "blocked indefinitely" is a user
> category called Wikipedians that are blocked indefinitely "被永久封禁的維基人"
> http://zh.wikipedia.org/wiki/Category:%E8%A2%AB%E6%B0%B8%E4%B9%85%E5%B0%81%E7%A6%81%E7%9A%84%E7%B6%AD%E5%9F%BA%E4%BA%BA
>Its equivalent Wikidata table has the following pages in other language
> versions:
> http://www.wikidata.org/wiki/Q4616402#sitelinks-wikipedia
>  Language Code Linked article
>English enwiki Category:Blocked historical 
> users
>   italiano itwiki Categoria:Wikipedia:Cloni 
> sospetti
>   latviešu lvwiki Kategorija:Uz nenoteiktu laiku nobloķētie 
> lietotāji
>   slovenčina skwiki Kategória:Wikipédia:Natrvalo zablokovaní 
> používatelia
>   česky cswiki Kategorie:Wikipedie:Natrvalo zablokovaní 
> uživatelé
>   български bgwiki Категория:Блокирани неприемливи потребителски 
> имена
>   олык марий mhrwiki Категорий:Википедий:Йӧн 
> петырыме
>   українська ukwiki Категорія:Безстроково заблоковані 
> користувачі
>   中文 zhwiki 
> Category:被永久封禁的維基人
>   日本語 jawiki Category:無期限ブロックを受けたユー 
> ザー
>
>
>
> I hope that it helps.
> Best,
> han-teng liao
>
>
>
> 2013/9/29 Piotr Konieczny 
>
>> Hi everyone,
>>
>> Another question: do we have an estimate of a vandal population?
>>
>> I also asked this at
>> https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#How_many_editors_are_blocked_indefinitely.3Fbut
>>  so far no good estimates have been provided.
>>
>> --
>> Piotr Konieczny, PhD
>> http://hanyang.academia.edu/PiotrKonieczny
>> http://scholar.google.com/citations?user=gdV8_AEJ
>> http://en.wikipedia.org/wiki/User

Re: [Wiki-research-l] Estimate of vandal population

2013-09-29 Thread Yaroslav M. Blanter

On 29.09.2013 10:04, Piotr Konieczny wrote:

I know of the categories, but the problem is that they do not seem to
be comprehensive. I can estimate, based on them, that there are at
least 150k or so editors who were banned for vandalism, but it seems
many vandals do not make it into those categories, suggesting this
number is underestimated.

 Still, we should be able to get some estimates. We know, for
example, that  something like 5 or 6 million of accounts have made 1+
edit on English Wikipedia. How many of them were indefinitely blocked?
This should give us some idea.

 Alternatively, we know how many accounts make an edit to Wikipedia
every given timeframe. About 100,000-120,000 editors make at least one
edit to Wikipedia each month. If we knew how many are indef blocked in
that period, that would be another useful estimate.

--
Piotr Konieczny, PhD
http://hanyang.academia.edu/PiotrKonieczny [2]
http://scholar.google.com/citations?user=gdV8_AEJ [3]
http://en.wikipedia.org/wiki/User:Piotrus [4]



I thought the bulk amount of vandalism comes from IP, and actually 
filters can provide some info on that.


If we switch from research to personal experience, I checked all edits 
on the Russian Wikivoyage since the beginning (October 2013). There was 
only one vandal who registered two or three accounts and was indeffed, 
but IP vandals and spambots are blocked on a regular basis, like every 
one or two weeks.


Cheers
Yaroslav

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Estimate of vandal population

2013-09-29 Thread Piotr Konieczny
I know of the categories, but the problem is that they do not seem to be 
comprehensive. I can estimate, based on them, that there are at least 
150k or so editors who were banned for vandalism, but it seems many 
vandals do not make it into those categories, suggesting this number is 
underestimated.


Still, we should be able to get some estimates. We know, for example, 
that  something like 5 or 6 million of accounts have made 1+ edit on 
English Wikipedia. How many of them were indefinitely blocked? This 
should give us some idea.


Alternatively, we know how many accounts make an edit to Wikipedia every 
given timeframe. About 100,000-120,000 editors make at least one edit to 
Wikipedia each month. If we knew how many are indef blocked in that 
period, that would be another useful estimate.


--
Piotr Konieczny, PhD
http://hanyang.academia.edu/PiotrKonieczny
http://scholar.google.com/citations?user=gdV8_AEJ
http://en.wikipedia.org/wiki/User:Piotrus



On 9/30/2013 11:44 AM, Stuart Yeates wrote:
I guess it depends on whether Piotr is looking for an estimate of 
accounts used for vandalism or an estimate of the people who operate 
them. One seems straight forward, the other more challenging. Perhaps 
combining the categories below with sock puppet investigations and 
some fancy stats?


Cheers
Stuart

On 29/09/2013, at 12:13 am, h > wrote:



Hello Piotr,
   I believe that in Chinese Wikipedia, "blocked indefinitely" is a 
user category called Wikipedians that are blocked indefinitely "被永 
久封禁的維基人" 
http://zh.wikipedia.org/wiki/Category:%E8%A2%AB%E6%B0%B8%E4%B9%85%E5%B0%81%E7%A6%81%E7%9A%84%E7%B6%AD%E5%9F%BA%E4%BA%BA
   Its equivalent Wikidata table has the following pages in other 
language versions:

http://www.wikidata.org/wiki/Q4616402#sitelinks-wikipedia
LanguageCodeLinked article  
English 	enwiki 	Category:Blocked historical users 
 	
italiano 	itwiki 	Categoria:Wikipedia:Cloni sospetti 
 	
latviešu 	lvwiki 	Kategorija:Uz nenoteiktu laiku nobloķētie lietotāji 
 
	
slovenčina 	skwiki 	Kategória:Wikipédia:Natrvalo zablokovaní 
používatelia 
 
	
česky 	cswiki 	Kategorie:Wikipedie:Natrvalo zablokovaní uživatelé 
 
	
български 	bgwiki 	Категория:Блокирани неприемливи потребителски 
имена 
 
	
олык марий 	mhrwiki 	Категорий:Википедий:Йӧн петырыме 
 
	
українська 	ukwiki 	Категорія:Безстроково заблоковані користувачі 
 
	
中文 	zhwiki 	Category:被永久封禁的維基人 
 
	
日本語 	jawiki 	Category:無期限ブロックを受けたユー ザー 
 
	





   I hope that it helps.
Best,
han-teng liao



2013/9/29 Piotr Konieczny mailto:pio...@post.pl>>

Hi everyone,

Another question: do we have an estimate of a vandal population?

I also asked this at

https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#How_many_editors_are_blocked_indefinitely.3F
but so far no good estimates have been provided.

-- 
Piotr Konieczny, PhD

http://hanyang.academia.edu/PiotrKonieczny
http://scholar.google.com/citations?user=gdV8_AEJ
http://en.wikipedia.org/wiki/User:Piotrus


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org

https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org 


Re: [Wiki-research-l] Estimate of vandal population

2013-09-29 Thread Stuart Yeates
I guess it depends on whether Piotr is looking for an estimate of accounts used 
for vandalism or an estimate of the people who operate them. One seems straight 
forward, the other more challenging. Perhaps combining the categories below 
with sock puppet investigations and some fancy stats?

Cheers
Stuart

> On 29/09/2013, at 12:13 am, h  wrote:
> 
> Hello Piotr, 
>I believe that in Chinese Wikipedia, "blocked indefinitely" is a user 
> category called Wikipedians that are blocked indefinitely "被永久封禁的維基人"  
> http://zh.wikipedia.org/wiki/Category:%E8%A2%AB%E6%B0%B8%E4%B9%85%E5%B0%81%E7%A6%81%E7%9A%84%E7%B6%AD%E5%9F%BA%E4%BA%BA
>Its equivalent Wikidata table has the following pages in other language 
> versions:
> http://www.wikidata.org/wiki/Q4616402#sitelinks-wikipedia
> Language  CodeLinked article  
> English   enwiki  Category:Blocked historical users   
> italiano  itwiki  Categoria:Wikipedia:Cloni sospetti  
> latviešu  lvwiki  Kategorija:Uz nenoteiktu laiku nobloķētie lietotāji 
> slovenčinaskwiki  Kategória:Wikipédia:Natrvalo zablokovaní používatelia   
> česky cswiki  Kategorie:Wikipedie:Natrvalo zablokovaní uživatelé  
> български bgwiki  Категория:Блокирани неприемливи потребителски имена 
> олык марийmhrwiki Категорий:Википедий:Йӧн петырыме
> українськаukwiki  Категорія:Безстроково заблоковані користувачі   
> 中文zhwiki  Category:被永久封禁的維基人  
> 日本語   jawiki  Category:無期限ブロックを受けたユーザー
> 
>I hope that it helps.
> Best,
> han-teng liao
> 
> 
> 
> 2013/9/29 Piotr Konieczny 
>> Hi everyone,
>> 
>> Another question: do we have an estimate of a vandal population?
>> 
>> I also asked this at 
>> https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#How_many_editors_are_blocked_indefinitely.3F
>>  but so far no good estimates have been provided.
>> 
>> -- 
>> Piotr Konieczny, PhD
>> http://hanyang.academia.edu/PiotrKonieczny
>> http://scholar.google.com/citations?user=gdV8_AEJ
>> http://en.wikipedia.org/wiki/User:Piotrus
>> 
>> 
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> 
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Estimate of vandal population

2013-09-29 Thread h
Hello Piotr,
   I believe that in Chinese Wikipedia, "blocked indefinitely" is a user
category called Wikipedians that are blocked indefinitely "被永久封禁的維基人"
http://zh.wikipedia.org/wiki/Category:%E8%A2%AB%E6%B0%B8%E4%B9%85%E5%B0%81%E7%A6%81%E7%9A%84%E7%B6%AD%E5%9F%BA%E4%BA%BA
   Its equivalent Wikidata table has the following pages in other language
versions:
http://www.wikidata.org/wiki/Q4616402#sitelinks-wikipedia
Language Code Linked article   English enwiki Category:Blocked
historical users
italiano itwiki Categoria:Wikipedia:Cloni
sospetti
latviešu lvwiki Kategorija:Uz nenoteiktu laiku nobloķētie
lietotāji
slovenčina skwiki Kategória:Wikipédia:Natrvalo zablokovaní
používatelia
česky cswiki Kategorie:Wikipedie:Natrvalo zablokovaní
uživatelé
български bgwiki Категория:Блокирани неприемливи потребителски
имена
 олык
марий mhrwiki Категорий:Википедий:Йӧн
петырыме
українська ukwiki Категорія:Безстроково заблоковані
користувачі
中文 zhwiki 
Category:被永久封禁的維基人
日本語 jawiki 
Category:無期限ブロックを受けたユーザー

   I hope that it helps.
Best,
han-teng liao



2013/9/29 Piotr Konieczny 

> Hi everyone,
>
> Another question: do we have an estimate of a vandal population?
>
> I also asked this at https://en.wikipedia.org/wiki/**
> Wikipedia:Village_pump_%**28technical%29#How_many_**editors_are_blocked_**
> indefinitely.3Fbut
>  so far no good estimates have been provided.
>
> --
> Piotr Konieczny, PhD
> http://hanyang.academia.edu/**PiotrKonieczny
> http://scholar.google.com/**citations?user=gdV8_AEJ
> http://en.wikipedia.org/wiki/**User:Piotrus
>
>
> __**_
> Wiki-research-l mailing list
> Wiki-research-l@lists.**wikimedia.org
> https://lists.wikimedia.org/**mailman/listinfo/wiki-**research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l