Re: [Wiki-research-l] Wikipedia Literature Review - Tools and Data Sets

2011-04-20 Thread Torsten Zesch
Dear Mohamad,

thanks for compiling this comprehensive list.

You might want to add JWPL:
http://code.google.com/p/jwpl/

and WikipediaMiner:
http://wikipedia-miner.sourceforge.net/

-Torsten

From: wiki-research-l-boun...@lists.wikimedia.org 
[mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of mohamad mehdi
Sent: Monday, April 18, 2011 3:20 PM
To: wiki-research-l@lists.wikimedia.org
Subject: [Wiki-research-l] Wikipedia Literature Review - Tools and Data Sets

Hi everyone,

This is a follow up on a previous thread (Wikipedia data sets) related to the 
Wikipedia literature review (Chitu Okoli). As I mentioned in my previous email, 
part of our study is to identify the data collection methods and data sets used 
for Wikipedia studies. Therefore, we searched for online tools used to extract 
Wikipedia articles and for pre-compiled Wikipedia articles data sets; we were 
able to identify the following list. Please let us know of any other sources 
you know about. Also, we would like to know if there is any existing Wikipedia 
page that includes such a list so we can add to it. Otherwise, where do you 
suggest adding this list so it is noticeable and useful for the community?

http://download.wikimedia.org/   /* official 
Wikipedia database dumps */
http://datamob.org/datasets/tag/wikipedia   /* Multiple data sets 
(English Wikipedia articles that have been transformed into XML) */
http://wiki.dbpedia.org/Datasets /* Structured 
information from Wikipedia*/
http://labs.systemone.at/wikipedia3/* Wikipedia³ is a 
conversion of the English Wikipedia into RDF. It's a monthly updated dataset 
containing around 47 million triples.*/
http://www.scribd.com/doc/9582/integrating-wikipediawordnet  /* article talking 
about integrating WorldNet and Wikipedia with YAGO */
http://www.infochimps.com/datasets/taxobox-wikipedia-infoboxes-with-taxonomic-information-on-animal/
http://www.infochimps.com/link_frame?dataset=11043   /* Wikipedia Datasets for 
the Hadoop Hack | Cloudera */
http://www.infochimps.com/link_frame?dataset=11166   /* Wikipedia: Lists of 
common misspellings/For machines */
http://www.infochimps.com/link_frame?dataset=11028   /* Building a (fast) 
Wikipedia offline reader */
http://www.infochimps.com/link_frame?dataset=11004   /* Using the Wikipedia 
page-to-page link database */
http://www.infochimps.com/link_frame?dataset=11285   /* List of films */
http://www.infochimps.com/link_frame?dataset=11598   /* MusicBrainz Database */
http://dammit.lt/wikistats/   /* Wikitech-l page counters */
http://snap.stanford.edu/data/wiki-meta.html/* Complete Wikipedia edit 
history (up to January 2008) */
http://aws.amazon.com/datasets/2596?_encoding=UTF8jiveRedirect=1  /* Wikipedia 
Page Traffic Statistics */
http://aws.amazon.com/datasets/2506   /* Wikipedia XML Data */
http://www-958.ibm.com/software/data/cognos/manyeyes/datasets?q=Wikipedia+  
 /* list of Wikipedia data sets */
Examples:
  
http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/top-1000-accessed-wikipedia-articl/versions/1
  /* Top 1000 Accessed Wikipedia Articles  */
  
http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/wikipedia-hits/versions/1
  /* Wikipedia Hits */

Tools to extract data from Wikipedia:
http://www.evanjones.ca/software/wikipedia2text.html/* 
Extracting Text from Wikipedia */
http://www.infochimps.com/link_frame?dataset=11121/* Wikipedia 
article traffic statistics */
http://blog.afterthedeadline.com/2009/12/04/generating-a-plain-text-corpus-from-wikipedia/
   /* Generating a Plain Text Corpus from Wikipedia */
http://www.infochimps.com/datasets/wikipedia-articles-title-autocomplete


Thank you,
Mohamad Mehdi
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Wikipedia Literature Review - Tools and Data Sets

2011-04-20 Thread Andrew Krizhanovsky
And maybe Wiktionary parser and visual interface :)
http://code.google.com/p/wikokit/

Best regards,
Andrew Krizhanovsky

On Wed, Apr 20, 2011 at 12:15 PM, Torsten Zesch
ze...@tk.informatik.tu-darmstadt.de wrote:
 Dear Mohamad,



 thanks for compiling this comprehensive list.



 You might want to add JWPL:

 http://code.google.com/p/jwpl/



 and WikipediaMiner:

 http://wikipedia-miner.sourceforge.net/



 -Torsten



 From: wiki-research-l-boun...@lists.wikimedia.org
 [mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of mohamad
 mehdi
 Sent: Monday, April 18, 2011 3:20 PM
 To: wiki-research-l@lists.wikimedia.org
 Subject: [Wiki-research-l] Wikipedia Literature Review - Tools and Data Sets



 Hi everyone,

 This is a follow up on a previous thread (Wikipedia data sets) related to
 the Wikipedia literature review (Chitu Okoli). As I mentioned in my previous
 email, part of our study is to identify the data collection methods and data
 sets used for Wikipedia studies. Therefore, we searched for online tools
 used to extract Wikipedia articles and for pre-compiled Wikipedia articles
 data sets; we were able to identify the following list. Please let us know
 of any other sources you know about. Also, we would like to know if there is
 any existing Wikipedia page that includes such a list so we can add to it.
 Otherwise, where do you suggest adding this list so it is noticeable and
 useful for the community?

 http://download.wikimedia.org/   /* official
 Wikipedia database dumps */
 http://datamob.org/datasets/tag/wikipedia   /* Multiple data
 sets (English Wikipedia articles that have been transformed into XML) */
 http://wiki.dbpedia.org/Datasets /* Structured
 information from Wikipedia*/
 http://labs.systemone.at/wikipedia3/* Wikipedia³ is
 a conversion of the English Wikipedia into RDF. It's a monthly updated
 dataset containing around 47 million triples.*/
 http://www.scribd.com/doc/9582/integrating-wikipediawordnet  /* article
 talking about integrating WorldNet and Wikipedia with YAGO */
 http://www.infochimps.com/datasets/taxobox-wikipedia-infoboxes-with-taxonomic-information-on-animal/
 http://www.infochimps.com/link_frame?dataset=11043   /* Wikipedia Datasets
 for the Hadoop Hack | Cloudera */
 http://www.infochimps.com/link_frame?dataset=11166   /* Wikipedia: Lists of
 common misspellings/For machines */
 http://www.infochimps.com/link_frame?dataset=11028   /* Building a (fast)
 Wikipedia offline reader */
 http://www.infochimps.com/link_frame?dataset=11004   /* Using the Wikipedia
 page-to-page link database */
 http://www.infochimps.com/link_frame?dataset=11285   /* List of films */
 http://www.infochimps.com/link_frame?dataset=11598   /* MusicBrainz Database
 */
 http://dammit.lt/wikistats/   /* Wikitech-l page counters */
 http://snap.stanford.edu/data/wiki-meta.html/* Complete Wikipedia edit
 history (up to January 2008) */
 http://aws.amazon.com/datasets/2596?_encoding=UTF8jiveRedirect=1  /*
 Wikipedia Page Traffic Statistics */
 http://aws.amazon.com/datasets/2506   /* Wikipedia XML Data */
 http://www-958.ibm.com/software/data/cognos/manyeyes/datasets?q=Wikipedia+
 /* list of Wikipedia data sets */
 Examples:
   http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/top-1000-accessed-wikipedia-articl/versions/1
 /* Top 1000 Accessed Wikipedia Articles  */
   http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/wikipedia-hits/versions/1  /*
 Wikipedia Hits */

 Tools to extract data from Wikipedia:
 http://www.evanjones.ca/software/wikipedia2text.html    /*
 Extracting Text from Wikipedia */
 http://www.infochimps.com/link_frame?dataset=11121/* Wikipedia
 article traffic statistics */
 http://blog.afterthedeadline.com/2009/12/04/generating-a-plain-text-corpus-from-wikipedia/
   /* Generating a Plain Text Corpus from Wikipedia */
 http://www.infochimps.com/datasets/wikipedia-articles-title-autocomplete



 Thank you,
 Mohamad Mehdi

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] best practices for recruiting study participants?

2011-04-20 Thread Fuster, Mayo
Dear Jodi and all!

I hope that you are fine.

Here there is a wiki page listing suggestions on how to develop a research in a 
way that respects Wikimedia community principles: 
http://meta.wikimedia.org/wiki/Notes_on_good_practices_on_Wikipedia_research

Hopeing is useful! Have a nice day, Mayo

«·´`·.(*·.¸(`·.¸ ¸.·´)¸.·*).·´`·»
«·´¨*·¸¸« Mayo Fuster Morell ».¸.·*¨`·»
«·´`·.(¸.·´(¸.·* *·.¸)`·.¸).·´`·»

Research Digital Commons Governance: http://www.onlinecreation.info

Ph.D European University Institute
Postdoctoral Researcher. Institute of Govern and Public Policies. Autonomous 
University of Barcelona.
Visiting scholar. Internet Interdisciplinary Institute. Open University of 
Catalonia (UOC).
Visiting researcher (2008). School of information. University of California, 
Berkeley.
Member Research Committee. Wikimedia Foundation

http://www.onlinecreation.info
E-mail: mayo.fus...@eui.eu
Skype: mayoneti
Phone Spanish State: 0034-648877748

From: wiki-research-l-boun...@lists.wikimedia.org 
[wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of Jodi Schneider 
[jodi.schnei...@deri.org]
Sent: 20 April 2011 01:18
To: Research into Wikimedia content and communities
Subject: [Wiki-research-l] best practices for recruiting study participants?

What are the recommended ways to recruit Wikipedians for a research study?

My thoughts are:

Specific recruitment (i.e. to particular populations/randomized samples):
- email?
- Talk page messages?

Generic recruitment:
- post to the Village Pump
- post to the appropriate project mailing list(s)

Does that seem right?

Anybody willing to share successful email/Talk page messages (offlist is fine)? 
I'm particularly concerned about giving sufficient info, tone, and not being 
spammy (perhaps a hard balance to hit!).

-Jodi
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Wikipedia Literature Review - Tools and Data Sets

2011-04-20 Thread paolo massa
We wrote a bunch a Python scripts for parsing Wikipedia dumps with
different goals.
You can get them at https://github.com/phauly/wiki-network/

We also released some datasets of network extracted from User Talk pages.
See http://www.gnuband.org/2011/04/19/wikipedia_datasets_released/

Enjoy! ;)

P.


On Wed, Apr 20, 2011 at 10:39 AM, Andrew Krizhanovsky
andrew.krizhanov...@gmail.com wrote:
 And maybe Wiktionary parser and visual interface :)
 http://code.google.com/p/wikokit/

 Best regards,
 Andrew Krizhanovsky

 On Wed, Apr 20, 2011 at 12:15 PM, Torsten Zesch
 ze...@tk.informatik.tu-darmstadt.de wrote:
 Dear Mohamad,



 thanks for compiling this comprehensive list.



 You might want to add JWPL:

 http://code.google.com/p/jwpl/



 and WikipediaMiner:

 http://wikipedia-miner.sourceforge.net/



 -Torsten



 From: wiki-research-l-boun...@lists.wikimedia.org
 [mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of mohamad
 mehdi
 Sent: Monday, April 18, 2011 3:20 PM
 To: wiki-research-l@lists.wikimedia.org
 Subject: [Wiki-research-l] Wikipedia Literature Review - Tools and Data Sets



 Hi everyone,

 This is a follow up on a previous thread (Wikipedia data sets) related to
 the Wikipedia literature review (Chitu Okoli). As I mentioned in my previous
 email, part of our study is to identify the data collection methods and data
 sets used for Wikipedia studies. Therefore, we searched for online tools
 used to extract Wikipedia articles and for pre-compiled Wikipedia articles
 data sets; we were able to identify the following list. Please let us know
 of any other sources you know about. Also, we would like to know if there is
 any existing Wikipedia page that includes such a list so we can add to it.
 Otherwise, where do you suggest adding this list so it is noticeable and
 useful for the community?

 http://download.wikimedia.org/   /* official
 Wikipedia database dumps */
 http://datamob.org/datasets/tag/wikipedia   /* Multiple data
 sets (English Wikipedia articles that have been transformed into XML) */
 http://wiki.dbpedia.org/Datasets /* Structured
 information from Wikipedia*/
 http://labs.systemone.at/wikipedia3/* Wikipedia³ is
 a conversion of the English Wikipedia into RDF. It's a monthly updated
 dataset containing around 47 million triples.*/
 http://www.scribd.com/doc/9582/integrating-wikipediawordnet  /* article
 talking about integrating WorldNet and Wikipedia with YAGO */
 http://www.infochimps.com/datasets/taxobox-wikipedia-infoboxes-with-taxonomic-information-on-animal/
 http://www.infochimps.com/link_frame?dataset=11043   /* Wikipedia Datasets
 for the Hadoop Hack | Cloudera */
 http://www.infochimps.com/link_frame?dataset=11166   /* Wikipedia: Lists of
 common misspellings/For machines */
 http://www.infochimps.com/link_frame?dataset=11028   /* Building a (fast)
 Wikipedia offline reader */
 http://www.infochimps.com/link_frame?dataset=11004   /* Using the Wikipedia
 page-to-page link database */
 http://www.infochimps.com/link_frame?dataset=11285   /* List of films */
 http://www.infochimps.com/link_frame?dataset=11598   /* MusicBrainz Database
 */
 http://dammit.lt/wikistats/   /* Wikitech-l page counters */
 http://snap.stanford.edu/data/wiki-meta.html/* Complete Wikipedia edit
 history (up to January 2008) */
 http://aws.amazon.com/datasets/2596?_encoding=UTF8jiveRedirect=1  /*
 Wikipedia Page Traffic Statistics */
 http://aws.amazon.com/datasets/2506   /* Wikipedia XML Data */
 http://www-958.ibm.com/software/data/cognos/manyeyes/datasets?q=Wikipedia+
 /* list of Wikipedia data sets */
 Examples:
   http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/top-1000-accessed-wikipedia-articl/versions/1
 /* Top 1000 Accessed Wikipedia Articles  */
   http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/wikipedia-hits/versions/1  /*
 Wikipedia Hits */

 Tools to extract data from Wikipedia:
 http://www.evanjones.ca/software/wikipedia2text.html    /*
 Extracting Text from Wikipedia */
 http://www.infochimps.com/link_frame?dataset=11121/* Wikipedia
 article traffic statistics */
 http://blog.afterthedeadline.com/2009/12/04/generating-a-plain-text-corpus-from-wikipedia/
   /* Generating a Plain Text Corpus from Wikipedia */
 http://www.infochimps.com/datasets/wikipedia-articles-title-autocomplete



 Thank you,
 Mohamad Mehdi

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




-- 
--
Paolo Massa
Email: paolo AT gnuband DOT org
Blog: http://gnuband.org

___
Wiki-research-l 

Re: [Wiki-research-l] Wikipedia Literature Review - Tools and Data Sets

2011-04-20 Thread emijrp
Not directly related with Wikipedia, but about wikis: WikiTeam[1] and their
dumps[2] about wikis. Thanks to these dumps, you can compare your research
results about Wikipedia community with other wiki communities in the world.

[1] http://code.google.com/p/wikiteam/
[2] http://code.google.com/p/wikiteam/downloads/list?can=1

2011/4/18 mohamad mehdi mohamad_me...@hotmail.com

  Hi everyone,

 This is a follow up on a previous thread (Wikipedia data sets) related to
 the Wikipedia literature review (Chitu Okoli). As I mentioned in my previous
 email, part of our study is to identify the data collection methods and data
 sets used for Wikipedia studies. Therefore, we searched for online tools
 used to extract Wikipedia articles and for pre-compiled Wikipedia articles
 data sets; we were able to identify the following list. Please let us know
 of any other sources you know about. Also, we would like to know if there is
 any existing Wikipedia page that includes such a list so we can add to it.
 Otherwise, where do you suggest adding this list so it is noticeable and
 useful for the community?

 http://download.wikimedia.org/   /* official
 Wikipedia database dumps */
 http://datamob.org/datasets/tag/wikipedia   /* Multiple data
 sets (English Wikipedia articles that have been transformed into XML) */
 http://wiki.dbpedia.org/Datasets /* Structured
 information from Wikipedia*/
 http://labs.systemone.at/wikipedia3/* Wikipedia³
 is a conversion of the English Wikipedia into RDF. It's a monthly updated
 dataset containing around 47 million triples.*/
 http://www.scribd.com/doc/9582/integrating-wikipediawordnet  /* article
 talking about integrating WorldNet and Wikipedia with YAGO */

 http://www.infochimps.com/datasets/taxobox-wikipedia-infoboxes-with-taxonomic-information-on-animal/
 http://www.infochimps.com/link_frame?dataset=11043   /* Wikipedia Datasets
 for the Hadoop Hack | Cloudera */
 http://www.infochimps.com/link_frame?dataset=11166   /* Wikipedia: Lists
 of common misspellings/For machines */
 http://www.infochimps.com/link_frame?dataset=11028   /* Building a (fast)
 Wikipedia offline reader */
 http://www.infochimps.com/link_frame?dataset=11004   /* Using the
 Wikipedia page-to-page link database */
 http://www.infochimps.com/link_frame?dataset=11285   /* List of films */
 http://www.infochimps.com/link_frame?dataset=11598   /* MusicBrainz
 Database */
 http://dammit.lt/wikistats/   /* Wikitech-l page counters */
 http://snap.stanford.edu/data/wiki-meta.html/* Complete Wikipedia edit
 history (up to January 2008) */
 http://aws.amazon.com/datasets/2596?_encoding=UTF8jiveRedirect=1  /*
 Wikipedia Page Traffic Statistics */
 http://aws.amazon.com/datasets/2506   /* Wikipedia XML Data */
 http://www-958.ibm.com/software/data/cognos/manyeyes/datasets?q=Wikipedia+
 /* list of Wikipedia data sets */
 Examples:

 http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/top-1000-accessed-wikipedia-articl/versions/1
 /* Top 1000 Accessed Wikipedia Articles  */

 http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/wikipedia-hits/versions/1
   /*
 Wikipedia Hits */

 Tools to extract data from Wikipedia:
 http://www.evanjones.ca/software/wikipedia2text.html/*
 Extracting Text from Wikipedia */
 http://www.infochimps.com/link_frame?dataset=11121/* Wikipedia
 article traffic statistics */

 http://blog.afterthedeadline.com/2009/12/04/generating-a-plain-text-corpus-from-wikipedia/
   /* Generating a Plain Text Corpus from Wikipedia */
 http://www.infochimps.com/datasets/wikipedia-articles-title-autocomplete



 Thank you,
 Mohamad Mehdi

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Wikipedia Literature Review - Tools and Data Sets

2011-04-20 Thread mohamad mehdi

Hi everyone,

Thank you all for your replies, we really appreciate your cooperation. Below is 
a summary of the tools and data sets recommended by Torsten, Andrew, paolo, and 
emijrp. We would also like to know if there is any existing Wikipedia page that 
includes such a list so we can add to it. Otherwise, where do you suggest 
adding this list so it is noticeable and useful for the community?

http://code.google.com/p/jwpl/
http://wikipedia-miner.sourceforge.net/
http://code.google.com/p/wikokit//*Wiktionary parser and visual interface */
https://github.com/phauly/wiki-network/  /*Python scripts for parsing 
Wikipedia dumps with different goals*/
http://www.gnuband.org/2011/04/19/wikipedia_datasets_released//*datasets of 
network extracted from User Talk pages*/
http://code.google.com/p/wikiteam/
http://code.google.com/p/wikiteam/downloads/list?can=1 
http://www.research.ibm.com/visual/projects/history_flow/
http://meta.wikimedia.org/wiki/WikiXRay
http://statmediawiki.forja.rediris.es/index_en.html

Best regards,
Mohamad Mehdi
  ___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l