On Tue, Jan 18, 2011 at 7:21 PM, Aryeh Gregor
<simetrical+wikil...@gmail.com> wrote:
> On Mon, Jan 17, 2011 at 9:12 PM, Roan Kattouw <roan.katt...@gmail.com> wrote:
>> Wikimedia doesn't technically use delta compression. It concatenates a
>> couple dozen adjacent revisions of the same page and compresses that
>> (with gzip?), achieving very good compression ratios because there is
>> a huge amount of duplication in, say, 20 adjacent revisions of
>> [[Barack Obama]] (small changes to a large page, probably a few
>> identical versions to due vandalism reverts, etc.).
>
> We used to do this, but the problem was that many articles are much
> larger than the compression window of typical compression algorithms,
> so the redundancy between adjacent revisions wasn't helping
> compression except for short articles.  Tim wrote a diff-based history
> storage method (see DiffHistoryBlob in includes/HistoryBlob.php) and
> deployed it on Wikimedia, for 93% space savings:
>
> http://lists.wikimedia.org/pipermail/wikitech-l/2010-March/047231.html

Why isn't this being used for the dumps?

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to