"Anthony" <wikim...@inbox.org> wrote in message 
news:AANLkTi=uk+uf3y_b+zld57wcfuef_7rf-bt8tnvtg...@mail.gmail.com...
> No, that's not the question.  The question is why are you
> uncompressing and undiffing (from DiffHistoryBlobs) only to recompress
> (to bz2) and then uncompress and recompress (to 7z) when you can get
> roughly the same compression by just extracting the blobs and removing
> any non-public data.

That's probably not nearly as straightforward as it sounds.  RevDel'd and 
suppressed revisions are not removed from the text storage; even Oversighted 
revisions are left there, only the entry in the revision table is removed or 
altered.  I don't know OTTOMH how regularly the DiffHistoryBlob system 
stores a 'key frame', and how easy it would be to break diff chains in order 
to snip out non-public data from them, but I'd guess a) not very, and b) 
that the current code doesn't give any consideration to doing so because 
there's no reason for it to do so.  So refactoring it to incorporate that, 
while not impossible, is a non-trivial amount of work.

> And there are lots of lower-priority things that are being done.  And
> lots of dollars sitting on the sidelines doing nothing.

Low-priority interesting things tend to get done when you have volunteers 
doing them.  While the value of some of the Foundation's expenditure is 
commonly debated, I think you'd struggle to argue that many of the WMF's 
dollars are not doing *anything*.

--HM 



_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to