[Wikitech-l] Wikipedia dump (20110620) is 5.8G whereas previous dump(20110526) is 6.8G
Hi, The dump under Recombine articles, templates, image descriptions, and primary meta-pagesĀ is 6.8GB in http://dumps.wikimedia.org/enwiki/20110526/ page, whereas the same dump here http://dumps.wikimedia.org/enwiki/20110620/ page is 5.8GB. To be more accurate: http://download.wikimedia.org/enwiki/20110620/enwiki-20110620-pages-articles.xml.bz2 (5.8GB) vs http://download.wikimedia.org/enwiki/20110526/enwiki-20110526-pages-articles.xml.bz2 (6.8GB) Any idea why we have such a big difference? Thanks ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikipedia dump (20110620) is 5.8G whereas previous dump(20110526) is 6.8G
Hi Sezgin, Ariel recently responded to a similar question on xmldatadumps-l: http://lists.wikimedia.org/pipermail/xmldatadumps-l/2011-July/000288.html Conrad On Thu, Jul 7, 2011 at 1:51 PM, Sezgin Sucu sucu...@gmail.com wrote: Hi, The dump under Recombine articles, templates, image descriptions, and primary meta-pagesĀ is 6.8GB in http://dumps.wikimedia.org/enwiki/20110526/ page, whereas the same dump here http://dumps.wikimedia.org/enwiki/20110620/ page is 5.8GB. To be more accurate: http://download.wikimedia.org/enwiki/20110620/enwiki-20110620-pages-articles.xml.bz2 (5.8GB) vs http://download.wikimedia.org/enwiki/20110526/enwiki-20110526-pages-articles.xml.bz2 (6.8GB) Any idea why we have such a big difference? Thanks ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Wikipedia Dump
Dear All, I have used two dumps from english Wikipedia as below, the count results turn out like this, Would you please let me know which one is completed and can be analyzed? and I am confused why the 2001-2009 had different number? Thanks very much !! select count (1), to_char(rev_timestamp,'') from enwiki.revision group by to_char(rev_timestamp,'') order by (to_char(rev_timestamp,'')) resource is : http://download.wikimedia.org/enwiki/20100130/enwiki-20100130-stub-meta-history.xml.gz +--+-+ | count(1) | year(rev_timestamp) | +--+-+ |57559 |2001 | | 616878 |2002 | | 1598363 |2003 | | 6999869 |2004 | | 20697477 |2005 | | 57214741 |2006 | | 75235972 |2007 | | 74757575 |2008 | | 70600627 |2009 | | 6017974 |2010 | +--+-+ resource is : http://download.wikimedia.org/enwiki/20101011/enwiki-20101011-stub-meta-history.xml.gz 64305 2001 616257 2002 15966122003 69794942004 20642853 2005 57043694 2006 74936692 2007 74387391 2008 70085652 2009 53054853 2010 - Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikipedia Dump
Dear all, I have used the dump from http://download.wikimedia.org/enwiki/20101011/enwiki-20101011-stub-meta-history.xml.gz, imported into sql database. However, I could see any data on 2001 to 2004, anyone know what's wrong? thanks, Zeyi ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l