https://bugzilla.wikimedia.org/show_bug.cgi?id=20757

Tim Starling <tstarl...@wikimedia.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|some text of old revisions  |Corruption of text from
                   |in early 2005 is blank in   |early 2005 due to
                   |the English Wikipedia       |HistoryBlobStub pointers
                   |                            |broken by
                   |                            |recompressTracked.php

--- Comment #22 from Tim Starling <tstarl...@wikimedia.org> 2010-02-08 07:35:53 
UTC ---
OK I've checked a lot of these test cases, and they all seem to be the same, so
I'm changing the summary. All of the relevant revisions should now be serving
errors instead of pretending to be blank.

The original version of compressOld.php concatenated several revisions into one
"blob" and stored it in a random row in the old table. Then the other old rows
which needed data from the concatenated blob would get a pointer object, called
a HistoryBlobStub. This pointer object gave an old_id and content hash which
located the text for that revision.

After we started using external storage (ES), all the bulk data was moved out
of the core database. Now, to load a HistoryBlobStub, MW would first load the
old_id where the concatenated text used to be, where it would find a second
pointer (with old_flags=external), then it would follow the second pointer to
load the blob from ES. This was an inefficient situation, so I introduced a new
pointer type (the "two-part CGZ URL") which pointed directly from the rows
where the stub objects used to be, into ES. 

I then wrote a script called resolveStubs.php, and ran it, removing all
HistoryBlobStub objects from the database. Or at least, that's what I thought I
did. It transpires that these missing revisions above are all HistoryBlobStub
objects that somehow escaped resolveStubs.php. 

The current generation of recompression script, trackBlobs/recompressTracked,
has no appropriate handling for HistoryBlobStub. It leaves the HistoryBlobStub
objects in place, but removes the CGZ objects they point to, creating a broken
pointer. 

Due to a bug in Revision.php, the broken pointer was displayed as a blank page
instead of an error message. This is fixed in r62119.

Luckily I was fairly paranoid when I wrote trackBlobs/recompressTracked, and
all the data required for recovery appears to have been retained. It's just a
matter of writing a bug fix script.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.

_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to