2011/1/17 Aryeh Gregor <simetrical+wikil...@gmail.com>:
> Wikimedia stores diffs using delta compression, so actually this is
> basically what happens.  The size of the edit is what determines the
> size of the stored diff, not the size of the page.  (I don't know how
> this works in detail, though.)  IIRC, default MediaWiki doesn't work
> this way.
>
Wikimedia doesn't technically use delta compression. It concatenates a
couple dozen adjacent revisions of the same page and compresses that
(with gzip?), achieving very good compression ratios because there is
a huge amount of duplication in, say, 20 adjacent revisions of
[[Barack Obama]] (small changes to a large page, probably a few
identical versions to due vandalism reverts, etc.). However,
decompressing it just gets you the raw text, so nothing in this
storage system helps generation of diffs. Diff generation is still
done by shelling out to wikidiff2 (a custom C++ diff implementation
that generates diffs with HTML markup like <ins>/<del>) and caching
the result in memcached.

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to