Re: [Xmldatadumps-l] [Wikitech-l] Compressing full-history dumps faster

2014-04-23 Thread Randall Farmer
Reviving an old thread--this is at a really early stage, but a new (de)compressor by Google named Brotli could someday be useful for packing history dumps: https://docs.google.com/presentation/d/1aigINmRR7fw_ml8rz0rJ3NTv08Qb3n6lZ_qvmxo8CzQ/present#sli forde=id.ge4739a87_10

Re: [Xmldatadumps-l] [Wikitech-l] Compressing full-history dumps faster

2014-03-08 Thread Randall Farmer
> I see you got more pointers there. :) Did you manage to explore them? The blocker is that I didn't hear much interest from dump folks in a non-7z archive format even if it boosted compression speed a lot. Of the packers Bulat replied with (zpaq, exdupe, pcompress, his own srep), exdupe and srep

Re: [Xmldatadumps-l] [Wikitech-l] Compressing full-history dumps faster

2014-03-08 Thread Federico Leva (Nemo)
Randall Farmer, 21/01/2014 23:26: Trying to get quick-and-dirty long-range matching into LZMA isn't feasible for me personally and there may be inherent technical difficulties. Still, I left a note on the 7-Zip boards as folks suggested; feel free to add anything there: https://sourceforge.net/p/

Re: [Xmldatadumps-l] [Wikitech-l] Compressing full-history dumps faster

2014-01-21 Thread Randall Farmer
Ack, sorry for the (no subject); again in the right thread: > For external uses like XML dumps integrating the compression > strategy into LZMA would however be very attractive. This would also > benefit other users of LZMA compression like HBase. For dumps or other uses, 7za -mx=3 / xz -3 is you

Re: [Xmldatadumps-l] [Wikitech-l] Compressing full-history dumps faster

2014-01-21 Thread Randall Farmer
> > That does not sound like much economically. Do keep in mind the cost of > porting, deploying, maintaining, obtaining, and so on, new tools. Briefly, yes, CPU-hours don't cost too much, but I don't think the potential win is limited to the direct CPU-hours saved. In more detail: For Wikimedia

Re: [Xmldatadumps-l] [Wikitech-l] Compressing full-history dumps faster

2014-01-20 Thread Bjoern Hoehrmann
* Randall Farmer wrote: >As I understand, compressing full-history dumps for English Wikipedia and >other big wikis takes a lot of resources: enwiki history is about 10TB >unpacked, and 7zip only packs a few MB/s/core. Even with 32 cores, that's >over a day of server time. There's been talk about w