The text storage backend could quite legitimately do that on its own.  I'm 
not quite sure why the reference to page/archive tables: no two revisions 
are "identical" (different rev_timestamp if nothing else); each revision has 
a text_id to the text of the revision in the text table: you mean that a 
revision entry could potentially refer to an existing text_id if it was 
demonstrably identical, rather than creating a new entry and potentially 
duplicating the text itself.  But the text table is not the final stage in 
the process, or at least it doesn't have to be; MediaWiki is happy as long 
as throwing that text_id into the database and cranking the handle churns 
out the appropriate text; it doesn't care how that text is stored or 
retrieved.  Only in the default setting is each old_text field populated 
with the full text.

That said, I do agree that this should be done.  We do it for images, we 
should do it for text, because it's useful for more than just data 
compression, as suggested by the OP.  It could be used to make evaluation of 
reversions in extensions like AbuseFilter and FlaggedRevs much more 
effective and efficient, for instance.  And it probably *could* be used to 
improve the compression of the fully-written text table.

--HM

<[email protected]> wrote in message news:[email protected]...
> Also it could be used to say "do I really need to store this revision in
> the 'page' or 'archive' tables, or can I just refer to an existing
> identical revision". 



_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to