Re: [Wikitech-l] From page history to sentence history

Anthony Wed, 19 Jan 2011 06:16:05 -0800

On Wed, Jan 19, 2011 at 3:33 AM, Aryeh Gregor
<simetrical+wikil...@gmail.com> wrote:
> On Wed, Jan 19, 2011 at 3:59 AM, Anthony <wikim...@inbox.org> wrote:
>> Why isn't this being used for the dumps?
>
> Well, the relevant code is totally unrelated, so the question is sort
> of a non sequitur.


No, the question is why the relevant code is totally unrelated.
Specifically, I'm talking about the full history dumps.

> If you mean "Why don't we have incremental dumps?"

No, that's not the question.  The question is why are you
uncompressing and undiffing (from DiffHistoryBlobs) only to recompress
(to bz2) and then uncompress and recompress (to 7z) when you can get
roughly the same compression by just extracting the blobs and removing
any non-public data.  Or, if it's easier, continue to uncompress (in
gz) and undiff then rediff and recompress (in gz), as that will be
much much faster than compressing in bz2.

You'll also wind up with a full history dump which is *much* easier to
work with.  Yes, you'll break backward compatibility, but considering
that the English full history dump never finishes, even if you just
implemented it for that one it'd be better than the present, which is
to have nothing.

> I'm assuming the answer
> is (as usual in software development) that there are higher-priority
> things to do right now.

And there are lots of lower-priority things that are being done.  And
lots of dollars sitting on the sidelines doing nothing.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] From page history to sentence history

Reply via email to