Brion Vibber wrote:
Decompression takes as long as compression with bzip2
I think decompression is *faster* than compression
http://tukaani.org/lzma/benchmarks
LZMA is nice and fast to decompress... but *insanely* slower to
compress, and doesn't seem as parallelizable. :(
-- brion
I
On 3/26/09 3:25 PM, Keisial wrote:
Quite interesting. Can the images at office.wikimedia.org be moved to
somewhere public?
I've copied those two to the public wiki. :)
Decompression takes as long as compression with bzip2
I think decompression is *faster* than compression
On 03/27/09 01:14, Brion Vibber wrote:
LZMA is nice and fast to decompress... but *insanely* slower to
compress, and doesn't seem as parallelizable. :(
The xz file format should allow for easy parallelization, both when
compressing and decompressing; see
Perhaps the toolserver can make you a current dump of current en?
On Wed, Mar 25, 2009 at 11:08 AM, Christian Storm st...@iparadigms.comwrote:
Thanks to everyone who got the enwiki dumps going again! Should we expect
more regular dumps now? What was the final solution of fixing this?
toolserver users dont have access to text
On Wed, Mar 25, 2009 at 7:05 PM, Brian brian.min...@colorado.edu wrote:
Perhaps the toolserver can make you a current dump of current en?
On Wed, Mar 25, 2009 at 11:08 AM, Christian Storm st...@iparadigms.com
wrote:
Thanks to everyone who got the
Brion,
We are having to resort to crawling en.wikipedia.org while we await
for regular dumps.
What is the minimum crawling delay we can get away with? I figure if we
have 1 second delay then we'd be able to crawl the 2+ million articles
in a month.
I know crawling is discouraged but it seems
Russell Blau russblau at hotmail.com writes:
FWIW, I'll add my vote for aborting the current dump *now* if we don't
expect it ever to actually be finished, so we can at least get a fresh dump
of the current pages.
I'd like to third/fourth/(other ordinal) this idea too. I've been using the
Hoi,
Two things:
- if we abort the backup now, we do not know if we WILL have something at
the time it would have ended
- if the toolserver data can provide a service as a stop gap measure why
not provide that in the mean time
Thanks,
GerardM
2009/1/29 Alai alaiw...@gmail.com
On Thu, Jan 29, 2009 at 1:52 AM, Gerard Meijssen
gerard.meijs...@gmail.com wrote:
Hoi,
Two things:
- if we abort the backup now, we do not know if we WILL have something at
the time it would have ended
- if the toolserver data can provide a service as a stop gap measure why
not
On 1/28/09 8:32 AM, Brion Vibber wrote:
Probably wise to poke in a hack to skip the history first. :)
Done in r46545.
Updated dump scripts and canceled the old enwiki dump.
New dumps also will be attempting to generate log output as XML which
correctly handles the deletion/oversighting
On Thu, Jan 29, 2009 at 11:20 AM, Brion Vibber br...@wikimedia.org wrote:
On 1/28/09 8:32 AM, Brion Vibber wrote:
Probably wise to poke in a hack to skip the history first. :)
Done in r46545.
Updated dump scripts and canceled the old enwiki dump.
New dumps also will be attempting to
Brion Vibber br...@wikimedia.org wrote in message
news:497f9c35.9050...@wikimedia.org...
On 1/27/09 2:55 PM, Robert Rohde wrote:
On Tue, Jan 27, 2009 at 2:42 PM, Brion Vibberbr...@wikimedia.org
wrote:
On 1/27/09 2:35 PM, Thomas Dalton wrote:
The way I see it, what we need is to get a really
Probably wise to poke in a hack to skip the history first. :)
-- brion vibber (brion @ wikimedia.org)
On Jan 28, 2009, at 7:34, Russell Blau russb...@hotmail.com wrote:
Brion Vibber br...@wikimedia.org wrote in message
news:497f9c35.9050...@wikimedia.org...
On 1/27/09 2:55 PM, Robert Rohde
That would be great. I second this notion whole heartedly.
On Jan 28, 2009, at 7:34 AM, Russell Blau wrote:
Brion Vibber br...@wikimedia.org wrote in message
news:497f9c35.9050...@wikimedia.org...
On 1/27/09 2:55 PM, Robert Rohde wrote:
On Tue, Jan 27, 2009 at 2:42 PM, Brion
On 1/4/09 6:20 AM, yegg at alum.mit.edu wrote:
The current enwiki database dump
(http://download.wikimedia.org/enwiki/20081008/
) has been crawling along since 10/15/2008.
The current dump system is not sustainable on very large wikis and
is being replaced. You'll hear about it when we
I have a decent server that is dedicated for a Wikipedia project that
depends on the fresh dumps. Can this be used anyway to speed up the process
of generating the dumps?
bilal
On Tue, Jan 27, 2009 at 2:24 PM, Christian Storm st...@iparadigms.comwrote:
On 1/4/09 6:20 AM, yegg at alum.mit.edu
The problem, as I understand it (and Brion may come by to correct me)
is essentially that the current dump process is designed in a way that
can't be sustained given the size of enwiki. It really needs to be
re-engineered, which means that developer time is needed to create a
new approach to
Whether we want to let the current process continue to try and finish
or not, I would seriously suggest someone look into redumping the rest
of the enwiki files (i.e. logs, current pages, etc.). I am also among
the people that care about having reasonably fresh dumps and it really
is a
On 1/27/09 2:35 PM, Thomas Dalton wrote:
The way I see it, what we need is to get a really powerful server
Nope, it's a software architecture issue. We'll restart it with the new
arch when it's ready to go.
-- brion
___
Wikitech-l mailing list
On Tue, Jan 27, 2009 at 2:42 PM, Brion Vibber br...@wikimedia.org wrote:
On 1/27/09 2:35 PM, Thomas Dalton wrote:
The way I see it, what we need is to get a really powerful server
Nope, it's a software architecture issue. We'll restart it with the new
arch when it's ready to go.
I don't know
On 1/4/09 6:20 AM, y...@alum.mit.edu wrote:
The current enwiki database dump
(http://download.wikimedia.org/enwiki/20081008/) has been crawling
along since 10/15/2008.
The current dump system is not sustainable on very large wikis and is
being replaced. You'll hear about it when we have the
Understood--thank you. Any time-frame for when this might be launched?
On Mon, Jan 5, 2009 at 1:47 PM, Brion Vibber br...@wikimedia.org wrote:
On 1/4/09 6:20 AM, y...@alum.mit.edu wrote:
The current enwiki database dump
(http://download.wikimedia.org/enwiki/20081008/) has been crawling
along
22 matches
Mail list logo