Not directly related with Wikipedia, but about wikis: WikiTeam[1] and their dumps[2] about wikis. Thanks to these dumps, you can compare your research results about Wikipedia community with other wiki communities in the world.
[1] http://code.google.com/p/wikiteam/ [2] http://code.google.com/p/wikiteam/downloads/list?can=1 2011/4/18 mohamad mehdi <mohamad_me...@hotmail.com> > Hi everyone, > > This is a follow up on a previous thread (Wikipedia data sets) related to > the Wikipedia literature review (Chitu Okoli). As I mentioned in my previous > email, part of our study is to identify the data collection methods and data > sets used for Wikipedia studies. Therefore, we searched for online tools > used to extract Wikipedia articles and for pre-compiled Wikipedia articles > data sets; we were able to identify the following list. Please let us know > of any other sources you know about. Also, we would like to know if there is > any existing Wikipedia page that includes such a list so we can add to it. > Otherwise, where do you suggest adding this list so it is noticeable and > useful for the community? > > http://download.wikimedia.org/ /* official > Wikipedia database dumps */ > http://datamob.org/datasets/tag/wikipedia /* Multiple data > sets (English Wikipedia articles that have been transformed into XML) */ > http://wiki.dbpedia.org/Datasets /* Structured > information from Wikipedia*/ > http://labs.systemone.at/wikipedia3 /* Wikipedia³ > is a conversion of the English Wikipedia into RDF. It's a monthly updated > dataset containing around 47 million triples.*/ > http://www.scribd.com/doc/9582/integrating-wikipediawordnet /* article > talking about integrating WorldNet and Wikipedia with YAGO */ > > http://www.infochimps.com/datasets/taxobox-wikipedia-infoboxes-with-taxonomic-information-on-animal/ > http://www.infochimps.com/link_frame?dataset=11043 /* Wikipedia Datasets > for the Hadoop Hack | Cloudera */ > http://www.infochimps.com/link_frame?dataset=11166 /* Wikipedia: Lists > of common misspellings/For machines */ > http://www.infochimps.com/link_frame?dataset=11028 /* Building a (fast) > Wikipedia offline reader */ > http://www.infochimps.com/link_frame?dataset=11004 /* Using the > Wikipedia page-to-page link database */ > http://www.infochimps.com/link_frame?dataset=11285 /* List of films */ > http://www.infochimps.com/link_frame?dataset=11598 /* MusicBrainz > Database */ > http://dammit.lt/wikistats/ /* Wikitech-l page counters */ > http://snap.stanford.edu/data/wiki-meta.html /* Complete Wikipedia edit > history (up to January 2008) */ > http://aws.amazon.com/datasets/2596?_encoding=UTF8&jiveRedirect=1 /* > Wikipedia Page Traffic Statistics */ > http://aws.amazon.com/datasets/2506 /* Wikipedia XML Data */ > http://www-958.ibm.com/software/data/cognos/manyeyes/datasets?q=Wikipedia+ > /* list of Wikipedia data sets */ > Examples: > > http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/top-1000-accessed-wikipedia-articl/versions/1 > /* Top 1000 Accessed Wikipedia Articles */ > > http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/wikipedia-hits/versions/1 > /* > Wikipedia Hits */ > > Tools to extract data from Wikipedia: > http://www.evanjones.ca/software/wikipedia2text.html /* > Extracting Text from Wikipedia */ > http://www.infochimps.com/link_frame?dataset=11121 /* Wikipedia > article traffic statistics */ > > http://blog.afterthedeadline.com/2009/12/04/generating-a-plain-text-corpus-from-wikipedia/ > /* Generating a Plain Text Corpus from Wikipedia */ > http://www.infochimps.com/datasets/wikipedia-articles-title-autocomplete > > > > Thank you, > Mohamad Mehdi > > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > >
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l