Re: [Wiki-research-l] Wikipedia Literature Review - Tools and Data Sets

2011-05-01 Thread Sebastian Hellmann

Hi,
a short note on DBpedia. As a whole DBpedia provides:
- Data sets for Download
- Web Service for SPARQL (so you can query the data right away)
- A software for parsing Wikis ( about 10-20 developers, all of them 
volunteers except 2 or 3 ): http://wiki.dbpedia.org/Documentation
- We started to extract data from all Wikipedia Languages: 
http://wiki.dbpedia.org/Internationalization


Regards,
Sebastian



On 18.04.2011 15:19, mohamad mehdi wrote:

Hi everyone,

This is a follow up on a previous thread (Wikipedia data sets) related 
to the Wikipedia literature review (Chitu Okoli). As I mentioned in my 
previous email, part of our study is to identify the data collection 
methods and data sets used for Wikipedia studies. Therefore, 
we searched for online tools used to extract Wikipedia articles and 
for pre-compiled Wikipedia articles data sets; we were able to 
identify the following list. Please let us know of any other 
sources you know about. Also, we would like to know if there is any 
existing Wikipedia page that includes such a list so we can add to it. 
Otherwise, where do you suggest adding this list so it is noticeable 
and useful for the community?


http://download.wikimedia.org/   /* 
official Wikipedia database dumps */
http://datamob.org/datasets/tag/wikipedia   /* Multiple 
data sets (English Wikipedia articles that have been transformed into 
XML) */
http://wiki.dbpedia.org/Datasets /* 
Structured information from Wikipedia*/
http://labs.systemone.at/wikipedia3/* 
Wikipedia³ is a conversion of the English Wikipedia into RDF. It's a 
monthly updated dataset containing around 47 million triples.*/
http://www.scribd.com/doc/9582/integrating-wikipediawordnet  /* 
article talking about integrating WorldNet and Wikipedia with YAGO */
http://www.infochimps.com/datasets/taxobox-wikipedia-infoboxes-with-taxonomic-information-on-animal/ 

http://www.infochimps.com/link_frame?dataset=11043   /* Wikipedia 
Datasets for the Hadoop Hack | Cloudera */
http://www.infochimps.com/link_frame?dataset=11166   /* Wikipedia: 
Lists of common misspellings/For machines */
http://www.infochimps.com/link_frame?dataset=11028   /* Building a 
(fast) Wikipedia offline reader */
http://www.infochimps.com/link_frame?dataset=11004   /* Using the 
Wikipedia page-to-page link database */

http://www.infochimps.com/link_frame?dataset=11285   /* List of films */
http://www.infochimps.com/link_frame?dataset=11598   /* MusicBrainz 
Database */

http://dammit.lt/wikistats/   /* Wikitech-l page counters */
http://snap.stanford.edu/data/wiki-meta.html/* Complete Wikipedia 
edit history (up to January 2008) */
http://aws.amazon.com/datasets/2596?_encoding=UTF8jiveRedirect=1 
http://aws.amazon.com/datasets/2596?_encoding=UTF8jiveRedirect=1  
/* Wikipedia Page Traffic Statistics */

http://aws.amazon.com/datasets/2506   /* Wikipedia XML Data */
http://www-958.ibm.com/software/data/cognos/manyeyes/datasets?q=Wikipedia+   
/* list of Wikipedia data sets */

Examples:
http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/top-1000-accessed-wikipedia-articl/versions/1  
/* Top 1000 Accessed Wikipedia Articles  */
http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/wikipedia-hits/versions/1  /* 
Wikipedia Hits */


Tools to extract data from Wikipedia:
http://www.evanjones.ca/software/wikipedia2text.html/* 
Extracting Text from Wikipedia */
http://www.infochimps.com/link_frame?dataset=11121/* 
Wikipedia article traffic statistics */
http://blog.afterthedeadline.com/2009/12/04/generating-a-plain-text-corpus-from-wikipedia/ 
  /* Generating a Plain Text Corpus from Wikipedia */

http://www.infochimps.com/datasets/wikipedia-articles-title-autocomplete


Thank you,
Mohamad Mehdi


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Wikipedia Literature Review - Tools and Data Sets

2011-04-26 Thread Felipe Ortega
http://meta.wikimedia.org/wiki/Research#Research_Tools:_Statistics.2C_Visualization.2C_etc.

http://en.wikipedia.org/wiki/Wikipedia:Statistics#Automatically_updated_statistics


Best,
Felipe.





De: mohamad mehdi mohamad_me...@hotmail.com
Para: wiki-research-l@lists.wikimedia.org
Enviado: lun,18 abril, 2011 15:19
Asunto: [Wiki-research-l] Wikipedia Literature Review - Tools and Data Sets

 Hi everyone,
 
This is a follow up on a previous thread (Wikipedia data sets) related to the 
Wikipedia literature review (Chitu Okoli). As I mentioned in my previous email, 
part of our study is to identify the data collection methods and data sets 
used for Wikipedia studies. Therefore, we searched for online tools used 
to extract Wikipedia articles and for pre-compiled Wikipedia articles data 
sets; 
we were able to identify the following list. Please let us know of any other 
sources you know about. Also, we would like to know if there is any existing 
Wikipedia page that includes such a list so we can add to it. Otherwise, where 
do you suggest adding this list so it is noticeable and useful for the 
community?
  
http://download.wikimedia.org/   /* official 
Wikipedia database dumps */ 

http://datamob.org/datasets/tag/wikipedia   /* Multiple data sets 
(English Wikipedia articles that have been transformed into XML) */
http://wiki.dbpedia.org/Datasets /* Structured 
information from Wikipedia*/
http://labs.systemone.at/wikipedia3/* Wikipedia³ is a 
conversion of the English Wikipedia into RDF. It's a monthly updated dataset 
containing around 47 million triples.*/
http://www.scribd.com/doc/9582/integrating-wikipediawordnet  /* article talking 
about integrating WorldNet and Wikipedia with YAGO */
http://www.infochimps.com/datasets/taxobox-wikipedia-infoboxes-with-taxonomic-information-on-animal/
 
http://www.infochimps.com/link_frame?dataset=11043   /* Wikipedia Datasets for 
the Hadoop Hack | Cloudera */
http://www.infochimps.com/link_frame?dataset=11166   /* Wikipedia: Lists of 
common misspellings/For machines */
http://www.infochimps.com/link_frame?dataset=11028   /* Building a (fast) 
Wikipedia offline reader */
http://www.infochimps.com/link_frame?dataset=11004   /* Using the Wikipedia 
page-to-page link database */
http://www.infochimps.com/link_frame?dataset=11285   /* List of films */
http://www.infochimps.com/link_frame?dataset=11598   /* MusicBrainz Database */
http://dammit.lt/wikistats/   /* Wikitech-l page counters */
http://snap.stanford.edu/data/wiki-meta.html/* Complete Wikipedia edit 
history (up to January 2008) */
http://aws.amazon.com/datasets/2596?_encoding=UTF8jiveRedirect=1  /* Wikipedia 
Page Traffic Statistics */
http://aws.amazon.com/datasets/2506   /* Wikipedia XML Data */
http://www-958.ibm.com/software/data/cognos/manyeyes/datasets?q=Wikipedia+  
 
/* list of Wikipedia data sets */ 

Examples:
  
http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/top-1000-accessed-wikipedia-articl/versions/1
 
 /* Top 1000 Accessed Wikipedia Articles  */
  
http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/wikipedia-hits/versions/1
  /*
 Wikipedia Hits */
 
Tools to extract data from Wikipedia:
http://www.evanjones.ca/software/wikipedia2text.html/* 
Extracting Text from Wikipedia */
http://www.infochimps.com/link_frame?dataset=11121/* Wikipedia 
article traffic statistics */
http://blog.afterthedeadline.com/2009/12/04/generating-a-plain-text-corpus-from-wikipedia/
   /* Generating a Plain Text Corpus from Wikipedia */
http://www.infochimps.com/datasets/wikipedia-articles-title-autocomplete 
 
 

Thank you,
Mohamad Mehdi
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Wikipedia Literature Review - Tools and Data Sets

2011-04-20 Thread Torsten Zesch
Dear Mohamad,

thanks for compiling this comprehensive list.

You might want to add JWPL:
http://code.google.com/p/jwpl/

and WikipediaMiner:
http://wikipedia-miner.sourceforge.net/

-Torsten

From: wiki-research-l-boun...@lists.wikimedia.org 
[mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of mohamad mehdi
Sent: Monday, April 18, 2011 3:20 PM
To: wiki-research-l@lists.wikimedia.org
Subject: [Wiki-research-l] Wikipedia Literature Review - Tools and Data Sets

Hi everyone,

This is a follow up on a previous thread (Wikipedia data sets) related to the 
Wikipedia literature review (Chitu Okoli). As I mentioned in my previous email, 
part of our study is to identify the data collection methods and data sets used 
for Wikipedia studies. Therefore, we searched for online tools used to extract 
Wikipedia articles and for pre-compiled Wikipedia articles data sets; we were 
able to identify the following list. Please let us know of any other sources 
you know about. Also, we would like to know if there is any existing Wikipedia 
page that includes such a list so we can add to it. Otherwise, where do you 
suggest adding this list so it is noticeable and useful for the community?

http://download.wikimedia.org/   /* official 
Wikipedia database dumps */
http://datamob.org/datasets/tag/wikipedia   /* Multiple data sets 
(English Wikipedia articles that have been transformed into XML) */
http://wiki.dbpedia.org/Datasets /* Structured 
information from Wikipedia*/
http://labs.systemone.at/wikipedia3/* Wikipedia³ is a 
conversion of the English Wikipedia into RDF. It's a monthly updated dataset 
containing around 47 million triples.*/
http://www.scribd.com/doc/9582/integrating-wikipediawordnet  /* article talking 
about integrating WorldNet and Wikipedia with YAGO */
http://www.infochimps.com/datasets/taxobox-wikipedia-infoboxes-with-taxonomic-information-on-animal/
http://www.infochimps.com/link_frame?dataset=11043   /* Wikipedia Datasets for 
the Hadoop Hack | Cloudera */
http://www.infochimps.com/link_frame?dataset=11166   /* Wikipedia: Lists of 
common misspellings/For machines */
http://www.infochimps.com/link_frame?dataset=11028   /* Building a (fast) 
Wikipedia offline reader */
http://www.infochimps.com/link_frame?dataset=11004   /* Using the Wikipedia 
page-to-page link database */
http://www.infochimps.com/link_frame?dataset=11285   /* List of films */
http://www.infochimps.com/link_frame?dataset=11598   /* MusicBrainz Database */
http://dammit.lt/wikistats/   /* Wikitech-l page counters */
http://snap.stanford.edu/data/wiki-meta.html/* Complete Wikipedia edit 
history (up to January 2008) */
http://aws.amazon.com/datasets/2596?_encoding=UTF8jiveRedirect=1  /* Wikipedia 
Page Traffic Statistics */
http://aws.amazon.com/datasets/2506   /* Wikipedia XML Data */
http://www-958.ibm.com/software/data/cognos/manyeyes/datasets?q=Wikipedia+  
 /* list of Wikipedia data sets */
Examples:
  
http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/top-1000-accessed-wikipedia-articl/versions/1
  /* Top 1000 Accessed Wikipedia Articles  */
  
http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/wikipedia-hits/versions/1
  /* Wikipedia Hits */

Tools to extract data from Wikipedia:
http://www.evanjones.ca/software/wikipedia2text.html/* 
Extracting Text from Wikipedia */
http://www.infochimps.com/link_frame?dataset=11121/* Wikipedia 
article traffic statistics */
http://blog.afterthedeadline.com/2009/12/04/generating-a-plain-text-corpus-from-wikipedia/
   /* Generating a Plain Text Corpus from Wikipedia */
http://www.infochimps.com/datasets/wikipedia-articles-title-autocomplete


Thank you,
Mohamad Mehdi
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Wikipedia Literature Review - Tools and Data Sets

2011-04-20 Thread Andrew Krizhanovsky
And maybe Wiktionary parser and visual interface :)
http://code.google.com/p/wikokit/

Best regards,
Andrew Krizhanovsky

On Wed, Apr 20, 2011 at 12:15 PM, Torsten Zesch
ze...@tk.informatik.tu-darmstadt.de wrote:
 Dear Mohamad,



 thanks for compiling this comprehensive list.



 You might want to add JWPL:

 http://code.google.com/p/jwpl/



 and WikipediaMiner:

 http://wikipedia-miner.sourceforge.net/



 -Torsten



 From: wiki-research-l-boun...@lists.wikimedia.org
 [mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of mohamad
 mehdi
 Sent: Monday, April 18, 2011 3:20 PM
 To: wiki-research-l@lists.wikimedia.org
 Subject: [Wiki-research-l] Wikipedia Literature Review - Tools and Data Sets



 Hi everyone,

 This is a follow up on a previous thread (Wikipedia data sets) related to
 the Wikipedia literature review (Chitu Okoli). As I mentioned in my previous
 email, part of our study is to identify the data collection methods and data
 sets used for Wikipedia studies. Therefore, we searched for online tools
 used to extract Wikipedia articles and for pre-compiled Wikipedia articles
 data sets; we were able to identify the following list. Please let us know
 of any other sources you know about. Also, we would like to know if there is
 any existing Wikipedia page that includes such a list so we can add to it.
 Otherwise, where do you suggest adding this list so it is noticeable and
 useful for the community?

 http://download.wikimedia.org/   /* official
 Wikipedia database dumps */
 http://datamob.org/datasets/tag/wikipedia   /* Multiple data
 sets (English Wikipedia articles that have been transformed into XML) */
 http://wiki.dbpedia.org/Datasets /* Structured
 information from Wikipedia*/
 http://labs.systemone.at/wikipedia3/* Wikipedia³ is
 a conversion of the English Wikipedia into RDF. It's a monthly updated
 dataset containing around 47 million triples.*/
 http://www.scribd.com/doc/9582/integrating-wikipediawordnet  /* article
 talking about integrating WorldNet and Wikipedia with YAGO */
 http://www.infochimps.com/datasets/taxobox-wikipedia-infoboxes-with-taxonomic-information-on-animal/
 http://www.infochimps.com/link_frame?dataset=11043   /* Wikipedia Datasets
 for the Hadoop Hack | Cloudera */
 http://www.infochimps.com/link_frame?dataset=11166   /* Wikipedia: Lists of
 common misspellings/For machines */
 http://www.infochimps.com/link_frame?dataset=11028   /* Building a (fast)
 Wikipedia offline reader */
 http://www.infochimps.com/link_frame?dataset=11004   /* Using the Wikipedia
 page-to-page link database */
 http://www.infochimps.com/link_frame?dataset=11285   /* List of films */
 http://www.infochimps.com/link_frame?dataset=11598   /* MusicBrainz Database
 */
 http://dammit.lt/wikistats/   /* Wikitech-l page counters */
 http://snap.stanford.edu/data/wiki-meta.html/* Complete Wikipedia edit
 history (up to January 2008) */
 http://aws.amazon.com/datasets/2596?_encoding=UTF8jiveRedirect=1  /*
 Wikipedia Page Traffic Statistics */
 http://aws.amazon.com/datasets/2506   /* Wikipedia XML Data */
 http://www-958.ibm.com/software/data/cognos/manyeyes/datasets?q=Wikipedia+
 /* list of Wikipedia data sets */
 Examples:
   http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/top-1000-accessed-wikipedia-articl/versions/1
 /* Top 1000 Accessed Wikipedia Articles  */
   http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/wikipedia-hits/versions/1  /*
 Wikipedia Hits */

 Tools to extract data from Wikipedia:
 http://www.evanjones.ca/software/wikipedia2text.html    /*
 Extracting Text from Wikipedia */
 http://www.infochimps.com/link_frame?dataset=11121/* Wikipedia
 article traffic statistics */
 http://blog.afterthedeadline.com/2009/12/04/generating-a-plain-text-corpus-from-wikipedia/
   /* Generating a Plain Text Corpus from Wikipedia */
 http://www.infochimps.com/datasets/wikipedia-articles-title-autocomplete



 Thank you,
 Mohamad Mehdi

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Wikipedia Literature Review - Tools and Data Sets

2011-04-20 Thread paolo massa
We wrote a bunch a Python scripts for parsing Wikipedia dumps with
different goals.
You can get them at https://github.com/phauly/wiki-network/

We also released some datasets of network extracted from User Talk pages.
See http://www.gnuband.org/2011/04/19/wikipedia_datasets_released/

Enjoy! ;)

P.


On Wed, Apr 20, 2011 at 10:39 AM, Andrew Krizhanovsky
andrew.krizhanov...@gmail.com wrote:
 And maybe Wiktionary parser and visual interface :)
 http://code.google.com/p/wikokit/

 Best regards,
 Andrew Krizhanovsky

 On Wed, Apr 20, 2011 at 12:15 PM, Torsten Zesch
 ze...@tk.informatik.tu-darmstadt.de wrote:
 Dear Mohamad,



 thanks for compiling this comprehensive list.



 You might want to add JWPL:

 http://code.google.com/p/jwpl/



 and WikipediaMiner:

 http://wikipedia-miner.sourceforge.net/



 -Torsten



 From: wiki-research-l-boun...@lists.wikimedia.org
 [mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of mohamad
 mehdi
 Sent: Monday, April 18, 2011 3:20 PM
 To: wiki-research-l@lists.wikimedia.org
 Subject: [Wiki-research-l] Wikipedia Literature Review - Tools and Data Sets



 Hi everyone,

 This is a follow up on a previous thread (Wikipedia data sets) related to
 the Wikipedia literature review (Chitu Okoli). As I mentioned in my previous
 email, part of our study is to identify the data collection methods and data
 sets used for Wikipedia studies. Therefore, we searched for online tools
 used to extract Wikipedia articles and for pre-compiled Wikipedia articles
 data sets; we were able to identify the following list. Please let us know
 of any other sources you know about. Also, we would like to know if there is
 any existing Wikipedia page that includes such a list so we can add to it.
 Otherwise, where do you suggest adding this list so it is noticeable and
 useful for the community?

 http://download.wikimedia.org/   /* official
 Wikipedia database dumps */
 http://datamob.org/datasets/tag/wikipedia   /* Multiple data
 sets (English Wikipedia articles that have been transformed into XML) */
 http://wiki.dbpedia.org/Datasets /* Structured
 information from Wikipedia*/
 http://labs.systemone.at/wikipedia3/* Wikipedia³ is
 a conversion of the English Wikipedia into RDF. It's a monthly updated
 dataset containing around 47 million triples.*/
 http://www.scribd.com/doc/9582/integrating-wikipediawordnet  /* article
 talking about integrating WorldNet and Wikipedia with YAGO */
 http://www.infochimps.com/datasets/taxobox-wikipedia-infoboxes-with-taxonomic-information-on-animal/
 http://www.infochimps.com/link_frame?dataset=11043   /* Wikipedia Datasets
 for the Hadoop Hack | Cloudera */
 http://www.infochimps.com/link_frame?dataset=11166   /* Wikipedia: Lists of
 common misspellings/For machines */
 http://www.infochimps.com/link_frame?dataset=11028   /* Building a (fast)
 Wikipedia offline reader */
 http://www.infochimps.com/link_frame?dataset=11004   /* Using the Wikipedia
 page-to-page link database */
 http://www.infochimps.com/link_frame?dataset=11285   /* List of films */
 http://www.infochimps.com/link_frame?dataset=11598   /* MusicBrainz Database
 */
 http://dammit.lt/wikistats/   /* Wikitech-l page counters */
 http://snap.stanford.edu/data/wiki-meta.html/* Complete Wikipedia edit
 history (up to January 2008) */
 http://aws.amazon.com/datasets/2596?_encoding=UTF8jiveRedirect=1  /*
 Wikipedia Page Traffic Statistics */
 http://aws.amazon.com/datasets/2506   /* Wikipedia XML Data */
 http://www-958.ibm.com/software/data/cognos/manyeyes/datasets?q=Wikipedia+
 /* list of Wikipedia data sets */
 Examples:
   http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/top-1000-accessed-wikipedia-articl/versions/1
 /* Top 1000 Accessed Wikipedia Articles  */
   http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/wikipedia-hits/versions/1  /*
 Wikipedia Hits */

 Tools to extract data from Wikipedia:
 http://www.evanjones.ca/software/wikipedia2text.html    /*
 Extracting Text from Wikipedia */
 http://www.infochimps.com/link_frame?dataset=11121/* Wikipedia
 article traffic statistics */
 http://blog.afterthedeadline.com/2009/12/04/generating-a-plain-text-corpus-from-wikipedia/
   /* Generating a Plain Text Corpus from Wikipedia */
 http://www.infochimps.com/datasets/wikipedia-articles-title-autocomplete



 Thank you,
 Mohamad Mehdi

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




-- 
--
Paolo Massa
Email: paolo AT gnuband DOT org
Blog: http://gnuband.org

___
Wiki-research-l

Re: [Wiki-research-l] Wikipedia Literature Review - Tools and Data Sets

2011-04-20 Thread emijrp
Not directly related with Wikipedia, but about wikis: WikiTeam[1] and their
dumps[2] about wikis. Thanks to these dumps, you can compare your research
results about Wikipedia community with other wiki communities in the world.

[1] http://code.google.com/p/wikiteam/
[2] http://code.google.com/p/wikiteam/downloads/list?can=1

2011/4/18 mohamad mehdi mohamad_me...@hotmail.com

  Hi everyone,

 This is a follow up on a previous thread (Wikipedia data sets) related to
 the Wikipedia literature review (Chitu Okoli). As I mentioned in my previous
 email, part of our study is to identify the data collection methods and data
 sets used for Wikipedia studies. Therefore, we searched for online tools
 used to extract Wikipedia articles and for pre-compiled Wikipedia articles
 data sets; we were able to identify the following list. Please let us know
 of any other sources you know about. Also, we would like to know if there is
 any existing Wikipedia page that includes such a list so we can add to it.
 Otherwise, where do you suggest adding this list so it is noticeable and
 useful for the community?

 http://download.wikimedia.org/   /* official
 Wikipedia database dumps */
 http://datamob.org/datasets/tag/wikipedia   /* Multiple data
 sets (English Wikipedia articles that have been transformed into XML) */
 http://wiki.dbpedia.org/Datasets /* Structured
 information from Wikipedia*/
 http://labs.systemone.at/wikipedia3/* Wikipedia³
 is a conversion of the English Wikipedia into RDF. It's a monthly updated
 dataset containing around 47 million triples.*/
 http://www.scribd.com/doc/9582/integrating-wikipediawordnet  /* article
 talking about integrating WorldNet and Wikipedia with YAGO */

 http://www.infochimps.com/datasets/taxobox-wikipedia-infoboxes-with-taxonomic-information-on-animal/
 http://www.infochimps.com/link_frame?dataset=11043   /* Wikipedia Datasets
 for the Hadoop Hack | Cloudera */
 http://www.infochimps.com/link_frame?dataset=11166   /* Wikipedia: Lists
 of common misspellings/For machines */
 http://www.infochimps.com/link_frame?dataset=11028   /* Building a (fast)
 Wikipedia offline reader */
 http://www.infochimps.com/link_frame?dataset=11004   /* Using the
 Wikipedia page-to-page link database */
 http://www.infochimps.com/link_frame?dataset=11285   /* List of films */
 http://www.infochimps.com/link_frame?dataset=11598   /* MusicBrainz
 Database */
 http://dammit.lt/wikistats/   /* Wikitech-l page counters */
 http://snap.stanford.edu/data/wiki-meta.html/* Complete Wikipedia edit
 history (up to January 2008) */
 http://aws.amazon.com/datasets/2596?_encoding=UTF8jiveRedirect=1  /*
 Wikipedia Page Traffic Statistics */
 http://aws.amazon.com/datasets/2506   /* Wikipedia XML Data */
 http://www-958.ibm.com/software/data/cognos/manyeyes/datasets?q=Wikipedia+
 /* list of Wikipedia data sets */
 Examples:

 http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/top-1000-accessed-wikipedia-articl/versions/1
 /* Top 1000 Accessed Wikipedia Articles  */

 http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/wikipedia-hits/versions/1
   /*
 Wikipedia Hits */

 Tools to extract data from Wikipedia:
 http://www.evanjones.ca/software/wikipedia2text.html/*
 Extracting Text from Wikipedia */
 http://www.infochimps.com/link_frame?dataset=11121/* Wikipedia
 article traffic statistics */

 http://blog.afterthedeadline.com/2009/12/04/generating-a-plain-text-corpus-from-wikipedia/
   /* Generating a Plain Text Corpus from Wikipedia */
 http://www.infochimps.com/datasets/wikipedia-articles-title-autocomplete



 Thank you,
 Mohamad Mehdi

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Wikipedia Literature Review - Tools and Data Sets

2011-04-20 Thread mohamad mehdi

Hi everyone,

Thank you all for your replies, we really appreciate your cooperation. Below is 
a summary of the tools and data sets recommended by Torsten, Andrew, paolo, and 
emijrp. We would also like to know if there is any existing Wikipedia page that 
includes such a list so we can add to it. Otherwise, where do you suggest 
adding this list so it is noticeable and useful for the community?

http://code.google.com/p/jwpl/
http://wikipedia-miner.sourceforge.net/
http://code.google.com/p/wikokit//*Wiktionary parser and visual interface */
https://github.com/phauly/wiki-network/  /*Python scripts for parsing 
Wikipedia dumps with different goals*/
http://www.gnuband.org/2011/04/19/wikipedia_datasets_released//*datasets of 
network extracted from User Talk pages*/
http://code.google.com/p/wikiteam/
http://code.google.com/p/wikiteam/downloads/list?can=1 
http://www.research.ibm.com/visual/projects/history_flow/
http://meta.wikimedia.org/wiki/WikiXRay
http://statmediawiki.forja.rediris.es/index_en.html

Best regards,
Mohamad Mehdi
  ___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Wikipedia Literature Review - Tools and Data Sets

2011-04-18 Thread mohamad mehdi

Hi everyone,
 
This is a follow up on a previous thread (Wikipedia data sets) related to the 
Wikipedia literature review (Chitu Okoli). As I mentioned in my previous email, 
part of our study is to identify the data collection methods and data sets used 
for Wikipedia studies. Therefore, we searched for online tools used to extract 
Wikipedia articles and for pre-compiled Wikipedia articles data sets; we were 
able to identify the following list. Please let us know of any other sources 
you know about. Also, we would like to know if there is any existing Wikipedia 
page that includes such a list so we can add to it. Otherwise, where do you 
suggest adding this list so it is noticeable and useful for the community?
 
http://download.wikimedia.org/   /* official 
Wikipedia database dumps */ 
http://datamob.org/datasets/tag/wikipedia   /* Multiple data sets 
(English Wikipedia articles that have been transformed into XML) */
http://wiki.dbpedia.org/Datasets /* Structured 
information from Wikipedia*/
http://labs.systemone.at/wikipedia3/* Wikipedia³ is a 
conversion of the English Wikipedia into RDF. It's a monthly updated dataset 
containing around 47 million triples.*/
http://www.scribd.com/doc/9582/integrating-wikipediawordnet  /* article talking 
about integrating WorldNet and Wikipedia with YAGO */
http://www.infochimps.com/datasets/taxobox-wikipedia-infoboxes-with-taxonomic-information-on-animal/
 
http://www.infochimps.com/link_frame?dataset=11043   /* Wikipedia Datasets for 
the Hadoop Hack | Cloudera */
http://www.infochimps.com/link_frame?dataset=11166   /* Wikipedia: Lists of 
common misspellings/For machines */
http://www.infochimps.com/link_frame?dataset=11028   /* Building a (fast) 
Wikipedia offline reader */
http://www.infochimps.com/link_frame?dataset=11004   /* Using the Wikipedia 
page-to-page link database */
http://www.infochimps.com/link_frame?dataset=11285   /* List of films */
http://www.infochimps.com/link_frame?dataset=11598   /* MusicBrainz Database */
http://dammit.lt/wikistats/   /* Wikitech-l page counters */
http://snap.stanford.edu/data/wiki-meta.html/* Complete Wikipedia edit 
history (up to January 2008) */
http://aws.amazon.com/datasets/2596?_encoding=UTF8jiveRedirect=1  /* Wikipedia 
Page Traffic Statistics */
http://aws.amazon.com/datasets/2506   /* Wikipedia XML Data */
http://www-958.ibm.com/software/data/cognos/manyeyes/datasets?q=Wikipedia+  
 /* list of Wikipedia data sets */ 
Examples:
  
http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/top-1000-accessed-wikipedia-articl/versions/1
  /* Top 1000 Accessed Wikipedia Articles  */
  
http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/wikipedia-hits/versions/1
  /* Wikipedia Hits */
 
Tools to extract data from Wikipedia:
http://www.evanjones.ca/software/wikipedia2text.html/* 
Extracting Text from Wikipedia */
http://www.infochimps.com/link_frame?dataset=11121/* Wikipedia 
article traffic statistics */
http://blog.afterthedeadline.com/2009/12/04/generating-a-plain-text-corpus-from-wikipedia/
   /* Generating a Plain Text Corpus from Wikipedia */
http://www.infochimps.com/datasets/wikipedia-articles-title-autocomplete 
 
 
Thank you,
Mohamad Mehdi
  ___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l