> The worry bit is that it seems srv136 will now work as apache.
> So, where will dumps be done?

I'm not sure where (or if it has changed), but they are running now .... (:-)

To Ariel Glenn:

On getting them to work better in the future, this is what I would suggest:

First, note that everything except the "all history" dumps presents no
problem. It isn't perfect, but it is workable. The biggest "all pages
current" dump is enwiki, which takes about a day and a half, and the
compressed output file (bz2) still fits neatly on a DVD.

As to the history files, these are the problem; each contains all of
the preceding history and they just grow and grow. They must be
partitioned somehow. Suggestions have been made concerning
alphabetical partitions (very traditional for encyclopaedias ;-); you
yourself suggested page id.

I suggest the history be partitioned into "blocks" by *revision ID*

Like this: revision IDs (0)-999,999 go in "block 0", 1M to 2M-1 in
"block 1", and so on. The English Wiktionary at the moment would have
7 blocks; the English Wikipedia would have 273.

The dumps would continue as now up to "all pages current", including
the split-stub dump for the history (very important, as it provides
the "snapshot" of the DB state). But then when it gets to history, it
re-builds the last block done (possibly completing it), and then
writes 0-n new ones as needed.

Note that (to pick a random number) "block 71" of the enwiki defined
this way *has not changed* in a long time; only the current block(s)
need to be (re-)written. The history stays the same. (Of course?!)

If someone somewhere needs a copy of the wiki with all history as of a
given date, they can start with the split-stub for that date and read
in all the required blocks. But that isn't your problem any more. (;-)
They can do that with their disk and servers.

It would probably be best to still sort by page-id order within each
block, as they will compress much better that way.

One reason to rebuild the last block (or two) is to filter out deleted
and oversighted revisions. Deleted and oversighted revisions older
than some specific time (a small number of weeks) would remain. But
note that that is true *anyway*, as someone can always look at a
3-month old dump under any method.

With my best regards,
Robert

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to