[Bug 1935] Versioned data in backend

2011-01-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=1935

Roan Kattouw roan.katt...@gmail.com changed:

   What|Removed |Added

 CC||roan.katt...@gmail.com

--- Comment #3 from Roan Kattouw roan.katt...@gmail.com 2011-01-21 23:30:18 
UTC ---
(In reply to comment #2)
 Quoting Roan in
 http://permalink.gmane.org/gmane.science.linguistics.wikipedia.technical/51583
 
 '''
 Wikimedia doesn't technically use delta compression. It concatenates a
 couple dozen adjacent revisions of the same page and compresses that
 (with gzip?), achieving very good compression ratios because there is
 a huge amount of duplication in, say, 20 adjacent revisions of
 [[Barack Obama]] (small changes to a large page, probably a few
 identical versions to due vandalism reverts, etc.). However,
 decompressing it just gets you the raw text, so nothing in this
 storage system helps generation of diffs. Diff generation is still
 done by shelling out to wikidiff2 (a custom C++ diff implementation
 that generates diffs with HTML markup like ins/del) and caching
 the result in memcached.
 
 '''

...and I was wrong, see the replies to that post. We actually DO use
delta-based storage, almost exactly in the way you propose.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 1935] Versioned data in backend

2011-01-20 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=1935

Ashar Voultoiz has...@free.fr changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||has...@free.fr
 Resolution||WORKSFORME

--- Comment #2 from Ashar Voultoiz has...@free.fr 2011-01-20 20:11:37 UTC ---
Quoting Roan in
http://permalink.gmane.org/gmane.science.linguistics.wikipedia.technical/51583

'''
Wikimedia doesn't technically use delta compression. It concatenates a
couple dozen adjacent revisions of the same page and compresses that
(with gzip?), achieving very good compression ratios because there is
a huge amount of duplication in, say, 20 adjacent revisions of
[[Barack Obama]] (small changes to a large page, probably a few
identical versions to due vandalism reverts, etc.). However,
decompressing it just gets you the raw text, so nothing in this
storage system helps generation of diffs. Diff generation is still
done by shelling out to wikidiff2 (a custom C++ diff implementation
that generates diffs with HTML markup like ins/del) and caching
the result in memcached.

'''

Seems good enough. Closing bug as works for me.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l