Re: [Wiki-research-l] Editor Trends Study - Improving the tool

2010-11-11 Thread Felipe Ortega


--- El mié, 10/11/10, Diederik van Liere dvanli...@gmail.com escribió:

De: Diederik van Liere dvanli...@gmail.com
Asunto: [Wiki-research-l] Editor Trends Study - Improving the tool
Para: wiki-research-l@lists.wikimedia.org
Fecha: miércoles, 10 de noviembre, 2010 00:02

Hi, Diederik,

I'm also glad to see progress in this project. Some comments inline.

Dear researchers,

Recently, we started the Editor Trends Study 
(http://strategy.wikimedia.org/wiki/Editor_Trends_Study). 
The goal of this study is to get a better understanding of the community 

dynamics within the different Wikipedia projects. 

Part of this project consists of developing a tool 
(http://strategy.wikimedia.org/wiki/Editor_Trends_Study/Software)

that parses a Wikipedia dump file, extracts the required information, stores it 
in a database and exports it to a CSV file. This CSV file can then be used in a 
statistical program such as R, Stata or SAS. 

Well, I would have expected that the team would have done some previous search 
for open source code already available, that implements at least some (if not 
exactly all or the very same) of the planned functionalities.

Some examples are my own tool, WikiXRay, and Pywikpediabot (that, AFAIK, now it 
also includes a fast parser of Wikipedia dump files).

For my tool, now I use git for version control and you can use any of the two 
repos available (the official at libresoft, or the mirror at Gitorious):

http://git.libresoft.es/WikixRay/
http://gitorious.org/wikixray/wikixray

Well, they might not be the best possible software available, but I guess they 
can help to solve some problems, or at least help you to speed up the 
development and to avoid starting from scratch.


We are looking for some volunteers that would enjoy testing the tool. You don't 
need to be a 
software developer (although it helps :)) to help us; some patience, a bit of 
time and 
a fairly recent computer is all you need. You should be comfortable installing 
programs, 

working with a command-line interface and have basic Subversion experience. 
Python experience is a real bonus! 

The testing will focus on getting the tool to run without any supervision. For 
more background information, have a look at:

http://strategy.wikimedia.org/wiki/Editor_Trends_Study/Software

Perhaps you're going to provide this info later, but I don't see the links to 
your SVN repo (only [] ).

We are testing the tool with the largest Wikipedia projects, so if you would 
like to replicate

the analysis on your own favorite Wikipedia project or help improve the quality 
of the tool then please contact me off-list. 

I think it should be more effective to have another public list to which people 
specifically interested in this tool can suscribe (for example, like we have 
one for XML dumps exclusively).

This should sensibly reduce the number of duplicated bug reports, and comments, 
since other people can learn about known issues.

Hope this helps.

Best,
Felipe.

Best,

Diederik


-Adjunto en línea a continuación-

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



  ___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Editor Trends Study - Improving the tool

2010-11-11 Thread Diederik van Liere
Dear Piotr,
Thanks for your comments. A GUI is not very likely on the roadmap as
that requires significant time to develop, but I will try my best to
make the online documentation as clear as possible and you can always
email we if you have any questions.

Best,

Diederik
 To: Research into Wikimedia content and communities
        wiki-research-l@lists.wikimedia.org
 Message-ID: 4cd9d9e3.4040...@post.pl
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed

 Diederik van Liere wrote:

 We are looking for some volunteers that would enjoy testing the tool.
 You don't need to be a
 software developer (although it helps :)) to help us; some patience, a
 bit of time and
 a fairly recent computer is all you need. You should be comfortable
 installing programs,
 working with a command-line interface and have basic Subversion experience.
 Python experience is a real bonus!

 Quick feedback:
 * glad to see progress!
 * the wiki pages you link seem well designed and how-to's appear to make
 sense :)
 * as long as there is a need for a command-line interface and no
 graphical user interface, many would-be users will not be able to use it
 * ditto for things like Python and Subversion (I never even heard of the
 latter...).

 I assume that having a GUI is planned in some foreseeable future?


 --
 Piotr Konieczny

 To be defeated and not submit, is victory; to be victorious and rest on
 one's laurels, is defeat. --J?zef Pilsudski

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Editor Trends Study - Improving the tool

2010-11-11 Thread Felipe Ortega


--- El jue, 11/11/10, Diederik van Liere dvanli...@gmail.com escribió:

 De: Diederik van Liere dvanli...@gmail.com
 Asunto: Re: [Wiki-research-l] Editor Trends Study - Improving the tool
 Para: wiki-research-l@lists.wikimedia.org
 Fecha: jueves, 11 de noviembre, 2010 23:44
 Dear Felipe,
 
 We did investigate other tools before deciding to embark on
 this new
 project, as you rightly point out we should minimize code
 overlap.
 Pywikipediabot is an editing tool as far as I know and your
 tool,
 WikixRay, has definitely proven itself. However, I believe
 that a
 no-sql solution will give better performance than sql
 databases and
 that has been one of the main reasons to write this tool.
 
 I am not sure if a separate mailing list is required, at
 the moment
 it's not, but thanks for the suggestion and I have added
 the SVN link.
 

Thanks, Diederik. I'm also curious about testing the performance of MongoDB. I 
admit I've never tried this kind of DBs yet. 

Will check the SVN.

Best,
F.

 Best,
 
 Diederik
  To: Research into Wikimedia content and communities
         wiki-research-l@lists.wikimedia.org
  Message-ID: 376712.40857...@web27504.mail.ukl.yahoo.com
  Content-Type: text/plain; charset=iso-8859-1
 
 
 
  --- El mi?, 10/11/10, Diederik van Liere dvanli...@gmail.com
 escribi?:
 
  De: Diederik van Liere dvanli...@gmail.com
  Asunto: [Wiki-research-l] Editor Trends Study -
 Improving the tool
  Para: wiki-research-l@lists.wikimedia.org
  Fecha: mi?rcoles, 10 de noviembre, 2010 00:02
 
  Hi, Diederik,
 
  I'm also glad to see progress in this project. Some
 comments inline.
 
  Dear researchers,
 
  Recently, we started the Editor Trends Study 
  (http://strategy.wikimedia.org/wiki/Editor_Trends_Study).
  The goal of this study is to get a better
 understanding of the community
 
  dynamics within the different Wikipedia projects.
 
  Part of this project consists of developing a tool 
  (http://strategy.wikimedia.org/wiki/Editor_Trends_Study/Software)
 
  that parses a Wikipedia dump file, extracts the
 required information, stores it
  in a database and exports it to a CSV file. This CSV
 file can then be used in a
  statistical program such as R, Stata or SAS.
 
  Well, I would have expected that the team would have
 done some previous search for open source code already
 available, that implements at least some (if not exactly all
 or the very same) of the planned functionalities.
 
  Some examples are my own tool, WikiXRay, and
 Pywikpediabot (that, AFAIK, now it also includes a fast
 parser of Wikipedia dump files).
 
  For my tool, now I use git for version control and you
 can use any of the two repos available (the official at
 libresoft, or the mirror at Gitorious):
 
  http://git.libresoft.es/WikixRay/
  http://gitorious.org/wikixray/wikixray
 
  Well, they might not be the best possible software
 available, but I guess they can help to solve some problems,
 or at least help you to speed up the development and to
 avoid starting from scratch.
 
 
  We are looking for some volunteers that would enjoy
 testing the tool. You don't need to be a
  software developer (although it helps :)) to help us;
 some patience, a bit of time and
  a fairly recent computer is all you need. You should
 be comfortable installing programs,
 
  working with a command-line interface and have basic
 Subversion experience.
  Python experience is a real bonus!
 
  The testing will focus on getting the tool to run
 without any supervision. For more background information,
 have a look at:
 
  http://strategy.wikimedia.org/wiki/Editor_Trends_Study/Software
 
  Perhaps you're going to provide this info later, but I
 don't see the links to your SVN repo (only [] ).
 
  We are testing the tool with the largest Wikipedia
 projects, so if you would like to replicate
 
  the analysis on your own favorite Wikipedia project or
 help improve the quality of the tool then please contact me
 off-list.
 
  I think it should be more effective to have another
 public list to which people specifically interested in this
 tool can suscribe (for example, like we have one for XML
 dumps exclusively).
 
  This should sensibly reduce the number of duplicated
 bug reports, and comments, since other people can learn
 about known issues.
 
  Hope this helps.
 
  Best,
  Felipe.
 
  Best,
 
  Diederik
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 


  

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Editor Trends Study - Improving the tool

2010-11-09 Thread Diederik van Liere
Dear researchers,

Recently, we started the Editor Trends Study (
http://strategy.wikimedia.org/wiki/Editor_Trends_Study).
The goal of this study is to get a better understanding of the community
dynamics within the different Wikipedia projects.

Part of this project consists of developing a tool (
http://strategy.wikimedia.org/wiki/Editor_Trends_Study/Software)
that parses a Wikipedia dump file, extracts the required information, stores
it
in a database and exports it to a CSV file. This CSV file can then be used
in a
statistical program such as R, Stata or SAS.

We are looking for some volunteers that would enjoy testing the tool. You
don't need to be a
software developer (although it helps :)) to help us; some patience, a bit
of time and
a fairly recent computer is all you need. You should be comfortable
installing programs,
working with a command-line interface and have basic Subversion experience.
Python experience is a real bonus!

The testing will focus on getting the tool to run without any supervision.
For more background information, have a look at:
http://strategy.wikimedia.org/wiki/Editor_Trends_Study/Software

We are testing the tool with the largest Wikipedia projects, so if you would
like to replicate
the analysis on your own favorite Wikipedia project or help improve the
quality of the tool then please contact me off-list.



Best,

Diederik
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Editor Trends Study - Improving the tool

2010-11-09 Thread Piotr Konieczny
Diederik van Liere wrote:

 We are looking for some volunteers that would enjoy testing the tool.
 You don't need to be a
 software developer (although it helps :)) to help us; some patience, a
 bit of time and
 a fairly recent computer is all you need. You should be comfortable
 installing programs,
 working with a command-line interface and have basic Subversion experience.
 Python experience is a real bonus!

Quick feedback:
* glad to see progress!
* the wiki pages you link seem well designed and how-to's appear to make 
sense :)
* as long as there is a need for a command-line interface and no 
graphical user interface, many would-be users will not be able to use it
* ditto for things like Python and Subversion (I never even heard of the 
latter...).

I assume that having a GUI is planned in some foreseeable future?


-- 
Piotr Konieczny

To be defeated and not submit, is victory; to be victorious and rest on 
one's laurels, is defeat. --Józef Pilsudski

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l