On Tue, Jan 18, 2011 at 7:21 PM, Aryeh Gregor <simetrical+wikil...@gmail.com> wrote: > On Mon, Jan 17, 2011 at 9:12 PM, Roan Kattouw <roan.katt...@gmail.com> wrote: >> Wikimedia doesn't technically use delta compression. It concatenates a >> couple dozen adjacent revisions of the same page and compresses that >> (with gzip?), achieving very good compression ratios because there is >> a huge amount of duplication in, say, 20 adjacent revisions of >> [[Barack Obama]] (small changes to a large page, probably a few >> identical versions to due vandalism reverts, etc.). > > We used to do this, but the problem was that many articles are much > larger than the compression window of typical compression algorithms, > so the redundancy between adjacent revisions wasn't helping > compression except for short articles. Tim wrote a diff-based history > storage method (see DiffHistoryBlob in includes/HistoryBlob.php) and > deployed it on Wikimedia, for 93% space savings: > > http://lists.wikimedia.org/pipermail/wikitech-l/2010-March/047231.html
Why isn't this being used for the dumps? _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l