[Wikitech-l] Wikipedia dump (20110620) is 5.8G whereas previous dump(20110526) is 6.8G

2011-07-07 Thread Sezgin Sucu
Hi,
The dump under Recombine articles, templates, image descriptions, and
primary meta-pagesĀ is 6.8GB in
http://dumps.wikimedia.org/enwiki/20110526/ page, whereas the same
dump here http://dumps.wikimedia.org/enwiki/20110620/ page is 5.8GB.

To be more accurate:
http://download.wikimedia.org/enwiki/20110620/enwiki-20110620-pages-articles.xml.bz2
(5.8GB)
vs
http://download.wikimedia.org/enwiki/20110526/enwiki-20110526-pages-articles.xml.bz2
(6.8GB)

Any idea why we have such a big difference?

Thanks

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikipedia dump (20110620) is 5.8G whereas previous dump(20110526) is 6.8G

2011-07-07 Thread Conrad Irwin
Hi Sezgin,

Ariel recently responded to a similar question on xmldatadumps-l:
http://lists.wikimedia.org/pipermail/xmldatadumps-l/2011-July/000288.html

Conrad

On Thu, Jul 7, 2011 at 1:51 PM, Sezgin Sucu sucu...@gmail.com wrote:
 Hi,
 The dump under Recombine articles, templates, image descriptions, and
 primary meta-pagesĀ is 6.8GB in
 http://dumps.wikimedia.org/enwiki/20110526/ page, whereas the same
 dump here http://dumps.wikimedia.org/enwiki/20110620/ page is 5.8GB.

 To be more accurate:
 http://download.wikimedia.org/enwiki/20110620/enwiki-20110620-pages-articles.xml.bz2
 (5.8GB)
 vs
 http://download.wikimedia.org/enwiki/20110526/enwiki-20110526-pages-articles.xml.bz2
 (6.8GB)

 Any idea why we have such a big difference?

 Thanks

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Wikipedia Dump

2011-02-03 Thread zh509
Dear All, 

I have used two dumps from english Wikipedia as below, the count results 
turn out like this, Would you please let me know which one is completed and 
can be analyzed? and I am confused why the 2001-2009 had different number? 
Thanks very much !!

select count (1), to_char(rev_timestamp,'') from enwiki.revision group 
by to_char(rev_timestamp,'') order by (to_char(rev_timestamp,''))


resource is : 
http://download.wikimedia.org/enwiki/20100130/enwiki-20100130-stub-meta-history.xml.gz

+--+-+
| count(1) | year(rev_timestamp) |
+--+-+
|57559 |2001 |
|   616878 |2002 |
|  1598363 |2003 |
|  6999869 |2004 |
| 20697477 |2005 |
| 57214741 |2006 |
| 75235972 |2007 |
| 74757575 |2008 |
| 70600627 |2009 |
|  6017974 |2010 |
+--+-+


 
resource is : 
http://download.wikimedia.org/enwiki/20101011/enwiki-20101011-stub-meta-history.xml.gz

 64305  2001
 616257 2002
 15966122003
 69794942004
 20642853   2005
 57043694   2006
 74936692   2007
 74387391   2008
 70085652   2009
 53054853   2010

-
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l






___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikipedia Dump

2011-01-28 Thread zh509
Dear all,

I have used the dump from 
http://download.wikimedia.org/enwiki/20101011/enwiki-20101011-stub-meta-history.xml.gz,
 
imported into sql database.

However, I could see any data on 2001 to 2004, anyone know what's wrong?

thanks,

Zeyi
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l