Date: Wed, 17 Feb 2010 05:01:43 +0100
From: Tomasz Finc <tf...@wikimedia.org>
Subject: Re: [Wikitech-l] enwiki complete page edit history
To: Wikimedia developers <wikitech-l@lists.wikimedia.org>
Message-ID: <4b7b6a27.9040...@wikimedia.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

It sadly failed as noted in

http://lists.wikimedia.org/pipermail/xmldatadumps-admin-l/2010-January/000078.html

I've updated the index to clear that up.

--tomasz


Hi Tomasz,

The pages-meta-history.xml.bz2 is showing 115.4GB written (in progress) at:
http://download.wikipedia.org/enwiki/20100130/

The older pages-meta-history.xml.bz2 from 
http://download.wikipedia.org/enwiki/20091128/
shows 255.1GB written (failed build)

So once the 20100130 current pages-meta-history.xml.bz2 dump is finished 
writing, will it be over 255GB as it is newer than the older copy and contains 
more info?

Also these big files aren't weblinked for download lately I noticed.  I think 
they should be as they contain the full wikipedia history/discussion pages 
which have humongous amounts of useful information that should be available for 
easy distribution.  What is the reason they aren't weblinked, the bandwidth 
costs?

cheers,
Jamie






Jamie Morken wrote:
> Hi,
> 
> I was looking at the enwiki dump progress and noticed the file size for the 
> enwiki pages-meta-history.xml.bz2 has decreased
> from 255GB on 20100125 down to 105GB on 20100203.  Is it possible that
> old page revision edit data is being lost due to the smaller archive file
> size?
> 
> 2009-12-03 12:53:43 in-progress All pages with complete page edit history 
> (.bz2)2010-01-25
> 16:02:21: enwiki 14833408 pages (3.231/sec), 284292000 revs
> (61.930/sec), 54.7% prefetched, ETA 2010-02-03 02:34:19 [max 329446505]
> These dumps can be *very* large, uncompressing
> up to 20 times the archive download size. Suitable for archival and
> statistical use, most mirror sites won't want or need 
> this.pages-meta-history.xml.bz2 255.1 GB (written) 
> 2010-02-03 17:28:43 in-progress All pages with complete page edit history 
> (.bz2)2010-02-16
> 00:32:55: enwiki 747550 pages (0.704/sec), 95964000 revs (90.340/sec),
> 95.8% prefetched, ETA 2010-03-19 12:10:50 [max 341714004]
> These dumps can be *very* large, uncompressing
> up to 20 times the archive download size. Suitable for archival and
> statistical use, most mirror sites won't want or need 
> this.pages-meta-history.xml.bz2 105.1 GB (written) 
> cheers,
> Jamie
>



_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to