It seems there is disagreement about what the correct interpretation of NULL in
the rev_content_model column is. Should NULL there mean

(a) "the current page content model, as recorded in page_content_model"

or should it mean

(b) "the default for this title, no matter what page_content_model says"?


Kunal and I have had an unintentional edit war about this question in 
Revision.php:

Kunal changed it from (a) to (b) in https://gerrit.wikimedia.org/r/#/c/222043/
I later changed it from (b) to (a) in https://gerrit.wikimedia.org/r/#/c/297787/
Kunal reverted me from (a) to (b) in https://gerrit.wikimedia.org/r/#/c/298239/


So, which way do we want it?


The conflict seems to arise from (at least) three competing use cases:

I) re-interpreting page content. For instance, a user may move a misnamed
User:Foo.jss to User:Foo.js. In this case, the content should be re-interpreted
as JavaScript, including all old revisions. This would be in favor of behavior
(a), though it still works with (b), because the default model changes based on
the suffix ".js". I think it would however be better to only rely on title
parsing magic once, when creating the page, not later, when rendering old 
revisions.

II) converting page content. For instance, if a talk page gets converted to
using Flow, new revisions (and page_content_model) will have the Flow model,
while old revisions need to keep their original wikitext model (even though
their rev_content_model is null). That would need behavior (b).

III) changing a namespace's default content model. E.g. when installing an
extension that changes the default content model of a namespace (such as
Wikibase with Items in the main namespace, or Flow-per-default for Talk pages),
existing pages that were already in that namespace should still be readable.
With (b), this would fail: even though page_content_model has the correct model
for reading the page, rev_content_model is null, so the new namespace default is
used, which will fail. With (a), this would simply work: the page will be
rendered according to page_content_model.


In all cases it's possible to resolve the issue by replacing the NULL entries
for all revisions of a page with the current model id. The question is just when
and how we do that, and when and how we can even detect that this needs doing.

There is also an in-between option, let's call it a/b: fall back to
page_content_model for the latest revision (that should *always* be right), but
to ignore page_content_model for older revisions. That would cater to use case
III at least in so far as it would be possible to view the "misplaced" pages.
But viewing old revisions or diffs would still fail with a nasty error. This
option may look better on the surface, but I fear it will just add to the 
confusion.

There's another fix: never write null into rev_content_model. Always put the
actual model ID there. That's pretty wasteful, but it's robust and reliable.
When we decided to use null as a placeholder for the default, we assumed the
default would never change. But as we now see, it sometimes does...


So, what should it be, option (a) or (b)? And how do we address the use case
that is then broken? What should we write into rev_content_model in the future?

I personally think that option (a) makes more sense, because the resolutions of
defaults is then local to the database. It could even be done within the SQL
query. It's easier to maintain consistency that way. For use case II, that would
require us to "fill in" all the rev_content_model fields in old revisions when
converting a page. I think it would be a good thing to do that. If we have the
content model change between revisions, it seems prudent to record it 
explicitly.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to