Yuvi Panda wrote:
> Hi, I'm Yuvi, a student looking forward to working with MediaWiki via
> this year's GSoC.
> 
> I want to work on something dump related, and have been bugging
> apergos (Ariel) for a while now. One of the things that popped up into
> my head is moving the dump process to another language (say, C#, or
> Java, or be very macho and do C++ or C). This would give the dump
> process quite a bit of a speed bump (The profiling I did[1] seems to
> indicate that the DB is not the bottleneck. Might be wrong though),
> and can also be done in a way that makes running distributed dumps
> easier/more elegant.
> 
> So, thoughts on this? Is 'Move Dumping Process to another language' a
> good idea at all?
> 
> P.S. I'm just looking out for ideas, so if you have specific
> improvements to the dumping process in mind, please respond with those
> too. I already have DistributedBZip2 and Incremental Dumps in mind too
> :)
> 
> [1]: https://bugzilla.wikimedia.org/show_bug.cgi?id=5303
> 
> Thanks :)
> 

An idea I have been pondering is to pass the offset to the previous
revision to the compressor, so it would need much less work in the
compressing window to perform its work. You would need something like
7z/xz so that the window can be big enough to contain at least the
latest revision (its compression factor is quite impressive, too: 1TB
down to 2.31GB). Note that I haven't checked on how factible it can be
such modification to the compressor.


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to