https://bugzilla.wikimedia.org/show_bug.cgi?id=26499
--- Comment #17 from Adam Wight 2011-10-06 18:06:02 UTC ---
What about saving several indexes of data each in their own file?
For illustration,
tlwiki-20110926-pages-meta-history.xml.bz2.index-on-revision.sqlite3
tlwiki-20110926-pages-met
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499
--- Comment #16 from Ariel T. Glenn 2011-08-29 22:19:33
UTC ---
See Adminstrators'_noticeboard/Incidents, a total of 561938 revs last time I
looked (which was over a month ago, surely even worse now).
--
Configure bugmail: https://bugzilla.w
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499
Ángel González changed:
What|Removed |Added
CC||keis...@gmail.com
--- Comment #15 fro
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499
--- Comment #14 from Ariel T. Glenn 2011-08-29 19:39:55
UTC ---
Yeah, I'm familiar with seek-bzip2, but it didn't do what I needed for my use
case. I wanted to be able to easily locate a given XML page in a dump file
without an index. The gzi
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499
Andrew Dunbar changed:
What|Removed |Added
CC||hippytr...@gmail.com
--- Comment #13 f
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499
--- Comment #12 from Ariel T. Glenn 2011-08-29 18:07:24
UTC ---
(In response to comment 11)
No they aren't but I have a C library that could be used to build such an index
without a ton of work, for bzip2 files; specifically, there is a utili
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499
--- Comment #11 from Adam Wight 2011-06-04 11:07:57 UTC ---
Make it a requirement that the compression library is able to report compressed
block boundaries as it is working, so an index can be generated. This will
open many possibilities for
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499
--- Comment #10 from Diederik van Liere 2011-06-03
22:04:31 UTC ---
xz compression sounds good to me!
--
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are o
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499
--- Comment #9 from Platonides 2011-06-03 22:00:31 UTC
---
Diederik, they are not created uncompressed in memory.
I think we should just move to xz (mainly for the space benefits), which would
provide the uncompressed size as an added value.
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499
Diederik van Liere changed:
What|Removed |Added
Keywords||analytics
--
Configure bugmail:
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499
--- Comment #8 from Diederik van Liere 2011-06-02
22:40:04 UTC ---
Or alternatively, first create the page XML elements and once that's done and
you have collected meta data like number of articles, uncompressed size, etc.
prepend the metadata
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499
--- Comment #7 from Platonides 2011-06-02 22:35:03 UTC
---
Sorry, I didn't pay enough attention to the first post, I was thinking in
giving that metadata separatedly.
--
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=ema
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499
--- Comment #6 from Brion Vibber 2011-06-02 21:54:24 UTC
---
(In reply to comment #5)
> > Dump files are generated directly to their compressed form, so these exact
> > things aren't really possible to put in.
> You can just keep the count whe
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499
Platonides changed:
What|Removed |Added
CC||platoni...@gmail.com
--- Comment #5 from
https://bugzilla.wikimedia.org/show_bug.cgi?id=26499
Adam Wight changed:
What|Removed |Added
Summary|Include size of the dump|Include uncompressed size
15 matches
Mail list logo