Re: [Wiki-research-l] Estimate of vandal population

2013-10-01 Thread Dmitry Chichkov
I think a rough analysis user / IP talk pages could give you a number pretty quickly. You probably would want to do it by hand first and then write a script that analyses the wikipedia dump file. It is doable by hand, if you just sub-sample a few hundred pages randomly. And if normalized by a

Re: [Wiki-research-l] Revert detection

2011-08-22 Thread Dmitry Chichkov
Hi Aaron, Neat LimitedQueue class. It looks like this reverts code wouldn't handle some corner cases, for example I don't see logic that would distinguish between blanking (which produces duplicate checksums) and reverts. -- Best, Dmitry On Sun, Aug 21, 2011 at 3:15 PM, Aaron Halfaker

Re: [Wiki-research-l] Revert detection

2011-08-18 Thread Dmitry Chichkov
There have been a few publication on the subject: 1. Us vs. them: Understanding social dynamics in Wikipedia with revert graph visualizations, B Suh, EH Chi, BA Pendleton. 2. He says, she says: Conflict and coordination in Wikipedia., A Kittur, B Suh, BA Pendleton. From my experience I can tell

Re: [Wiki-research-l] wikistream: displays wikipedia updates in realtime

2011-08-17 Thread Dmitry Chichkov
Just verified, it is back up. And actual changes are also coming through [filtered by negative user ratings (calculated using some pretty old wikipedia dump)]. -- Best, Dmitry On Wed, Aug 17, 2011 at 2:33 AM, Dmitry Chichkov dchich...@gmail.comwrote: Hmm... Somebody actually visited the site

Re: [Wiki-research-l] Announcing Wikihadoop: using Hadoop to analyze Wikipedia dump files

2011-08-17 Thread Dmitry Chichkov
Hello, This is an excellent news! Have you tried running it on Amazon EC2? It would be really nice to know how well WikiHadoop scale up with the number of nodes. Also, this timing - '3 x Quad Core / 14 days / full wikipedia dump, on what kind of task (xml parsing, diffs, md5, etc?) was it

Re: [Wiki-research-l] Announcing Wikihadoop: using Hadoop to analyze Wikipedia dump files

2011-08-17 Thread Dmitry Chichkov
than science. Diederik On Wed, Aug 17, 2011 at 5:28 PM, Dmitry Chichkov dchich...@gmail.comwrote: Hello, This is an excellent news! Have you tried running it on Amazon EC2? It would be really nice to know how well WikiHadoop scale up with the number of nodes. Also, this timing - '3 x Quad

Re: [Wiki-research-l] Fraction of reverts

2011-08-15 Thread Dmitry Chichkov
I can recommend searching reverts wikipedia on the google scholar: http://scholar.google.com/scholar?q=reverts+wikipedia If you want to try running some analysis on the dump yourself, there's reverts analysis python code available here: http://code.google.com/p/pymwdat/ -- Best, Dmitry On

Re: [Wiki-research-l] Web 2.0 recent changes patrol tool demo (WPCVN)

2010-08-20 Thread Dmitry Chichkov
- excellent work. -- Cheers, Dmitry On Fri, Aug 20, 2010 at 12:02 AM, Daniel Kinzler dan...@brightbyte.dewrote: Hi Dimitry: Dmitry Chichkov schrieb: Some time ago as a Python/Django/JQuery/pywikipedia exercise I've hacked a web based recent changes patrol tool. An alpha version can be seen

Re: [Wiki-research-l] Most reverted pages in the en-wikipedia (enwiki-20100130 dump)

2010-08-19 Thread Dmitry Chichkov
/ ) * OrderedDict (available in Python 2.7 or http://pypi.python.org/pypi/ordereddict/) * 7-Zip (command line 7za) -- Dmitry On Thu, Aug 19, 2010 at 8:46 AM, John Vandenberg jay...@gmail.com wrote: On Sat, Aug 14, 2010 at 6:12 AM, Dmitry Chichkov dchich...@gmail.com wrote: If anybody is interested

[Wiki-research-l] Most reverted pages in the en-wikipedia (enwiki-20100130 dump)

2010-08-13 Thread Dmitry Chichkov
If anybody is interested, I've made a list of 'most reverted pages' in the english wikipedia based on the analysis of the enwiki-20100130 dump. Here is the list: http://wpcvn.com/enwiki-20100130.most.reverted.tar.bz http://wpcvn.com/enwiki-20100130.most.reverted.txt This list was calculated using