https://bugzilla.wikimedia.org/show_bug.cgi?id=1935
Roan Kattouw roan.katt...@gmail.com changed:
What|Removed |Added
CC||roan.katt...@gmail.com
--- Comment #3 from Roan Kattouw roan.katt...@gmail.com 2011-01-21 23:30:18
UTC ---
(In reply to comment #2)
Quoting Roan in
http://permalink.gmane.org/gmane.science.linguistics.wikipedia.technical/51583
'''
Wikimedia doesn't technically use delta compression. It concatenates a
couple dozen adjacent revisions of the same page and compresses that
(with gzip?), achieving very good compression ratios because there is
a huge amount of duplication in, say, 20 adjacent revisions of
[[Barack Obama]] (small changes to a large page, probably a few
identical versions to due vandalism reverts, etc.). However,
decompressing it just gets you the raw text, so nothing in this
storage system helps generation of diffs. Diff generation is still
done by shelling out to wikidiff2 (a custom C++ diff implementation
that generates diffs with HTML markup like ins/del) and caching
the result in memcached.
'''
...and I was wrong, see the replies to that post. We actually DO use
delta-based storage, almost exactly in the way you propose.
--
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l