[Bug 26499] Include uncompressed size and other metadata in each dump file

2011-10-06 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499 --- Comment #17 from Adam Wight 2011-10-06 18:06:02 UTC --- What about saving several indexes of data each in their own file? For illustration, tlwiki-20110926-pages-meta-history.xml.bz2.index-on-revision.sqlite3 tlwiki-20110926-pages-met

[Bug 26499] Include uncompressed size and other metadata in each dump file

2011-08-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499 --- Comment #16 from Ariel T. Glenn 2011-08-29 22:19:33 UTC --- See Adminstrators'_noticeboard/Incidents, a total of 561938 revs last time I looked (which was over a month ago, surely even worse now). -- Configure bugmail: https://bugzilla.w

[Bug 26499] Include uncompressed size and other metadata in each dump file

2011-08-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499 Ángel González changed: What|Removed |Added CC||keis...@gmail.com --- Comment #15 fro

[Bug 26499] Include uncompressed size and other metadata in each dump file

2011-08-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499 --- Comment #14 from Ariel T. Glenn 2011-08-29 19:39:55 UTC --- Yeah, I'm familiar with seek-bzip2, but it didn't do what I needed for my use case. I wanted to be able to easily locate a given XML page in a dump file without an index. The gzi

[Bug 26499] Include uncompressed size and other metadata in each dump file

2011-08-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499 Andrew Dunbar changed: What|Removed |Added CC||hippytr...@gmail.com --- Comment #13 f

[Bug 26499] Include uncompressed size and other metadata in each dump file

2011-08-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499 --- Comment #12 from Ariel T. Glenn 2011-08-29 18:07:24 UTC --- (In response to comment 11) No they aren't but I have a C library that could be used to build such an index without a ton of work, for bzip2 files; specifically, there is a utili

[Bug 26499] Include uncompressed size and other metadata in each dump file

2011-06-04 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499 --- Comment #11 from Adam Wight 2011-06-04 11:07:57 UTC --- Make it a requirement that the compression library is able to report compressed block boundaries as it is working, so an index can be generated. This will open many possibilities for

[Bug 26499] Include uncompressed size and other metadata in each dump file

2011-06-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499 --- Comment #10 from Diederik van Liere 2011-06-03 22:04:31 UTC --- xz compression sounds good to me! -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are o

[Bug 26499] Include uncompressed size and other metadata in each dump file

2011-06-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499 --- Comment #9 from Platonides 2011-06-03 22:00:31 UTC --- Diederik, they are not created uncompressed in memory. I think we should just move to xz (mainly for the space benefits), which would provide the uncompressed size as an added value.

[Bug 26499] Include uncompressed size and other metadata in each dump file

2011-06-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499 Diederik van Liere changed: What|Removed |Added Keywords||analytics -- Configure bugmail:

[Bug 26499] Include uncompressed size and other metadata in each dump file

2011-06-02 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499 --- Comment #8 from Diederik van Liere 2011-06-02 22:40:04 UTC --- Or alternatively, first create the page XML elements and once that's done and you have collected meta data like number of articles, uncompressed size, etc. prepend the metadata

[Bug 26499] Include uncompressed size and other metadata in each dump file

2011-06-02 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499 --- Comment #7 from Platonides 2011-06-02 22:35:03 UTC --- Sorry, I didn't pay enough attention to the first post, I was thinking in giving that metadata separatedly. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=ema

[Bug 26499] Include uncompressed size and other metadata in each dump file

2011-06-02 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499 --- Comment #6 from Brion Vibber 2011-06-02 21:54:24 UTC --- (In reply to comment #5) > > Dump files are generated directly to their compressed form, so these exact > > things aren't really possible to put in. > You can just keep the count whe

[Bug 26499] Include uncompressed size and other metadata in each dump file

2011-06-02 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499 Platonides changed: What|Removed |Added CC||platoni...@gmail.com --- Comment #5 from

[Bug 26499] Include uncompressed size and other metadata in each dump file

2011-02-24 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499 Adam Wight changed: What|Removed |Added Summary|Include size of the dump|Include uncompressed size