Re: [Xmldatadumps-l] [Wikitech-l] Suggested file format of new incremental dumps

Petr Onderka Wed, 03 Jul 2013 09:30:26 -0700

I'm primarily a Windows guy, so I'm trying to write the code in a portable
way and I will make sure the application works on both Linux and Windows.


Petr Onderka


On Wed, Jul 3, 2013 at 4:49 PM, Erik Zachte <ezac...@wikimedia.org> wrote:

> > it will now be a command line application that outputs the data as
> uncompressed XML, in the same format as current dumps.
>
> That will help a great deal. But I assume your application will be for
> Linux only?
> So it would help to still generate current compressed dumps, as post
> processing step, and store them online for download.
>
> One of the reasons of xml dumps is platform independence, both from
> producer side (we had ever evolving SQL dumps earlier), and consumer side
> (not everyone uses Linux).
>
> Erik Zachte
>
> -----Original Message-----
> From: wikitech-l-boun...@lists.wikimedia.org [mailto:
> wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Petr Onderka
> Sent: Wednesday, July 03, 2013 4:04 PM
> To: Wikimedia developers; Wikipedia Xmldatadumps-l
> Subject: Re: [Wikitech-l] [Xmldatadumps-l] Suggested file format of new
> incremental dumps
>
> A reply to all those who basically want to keep the current XML dumps:
>
> I have decided to change the primary way of reading the dumps: it will now
> be a command line application that outputs the data as uncompressed XML, in
> the same format as current dumps.
>
> This way, you should be able to use the new dumps with minimal changes to
> your code.
>
> Keeping the dumps in a text-based format doesn't make sense, because that
> can't be updated efficiently, which is the whole reason for the new dumps.
>
> Petr Onderka
>
>
> On Mon, Jul 1, 2013 at 11:10 PM, Byrial Jensen <byr...@vip.cybercity.dk
> >wrote:
>
> > Hi,
> >
> > As a regular of user of dump files I would not want a "fancy" file
> > format with indexes stored as trees etc.
> >
> > I parse all the dump files (both for SQL tables and the XML files)
> > with a one pass parser which inserts the data I want (which sometimes
> > is only a small fraction of the total amount of data in the file) into
> > my local database. I will normally never store uncompressed dump
> > files, but pipe the uncompressed data directly from bunzip or gunzip
> > to my parser to save disk space. Therefore it is important to me that
> > the format is simple enough for a one pass parser.
> >
> > I cannot really imagine who would use a library with object oriented
> > API to read dump files. No matter what it would be inefficient and
> > have fewer features and possibilities than using a real database.
> >
> > I could live with a binary format, but I have doubts if it is a good
> idea.
> > It will be harder to take sure that your parser is working correctly,
> > and you have to consider things like endianness, size of integers,
> > format of floats etc. which give no problems in text formats. The
> > binary files may be smaller uncompressed (which I don't store anyway)
> > but not necessary when compressed, as the compression will do better on
> text files.
> >
> > Regards,
> > - Byrial
> >
> >
> > ______________________________**_________________
> > Xmldatadumps-l mailing list
> > Xmldatadumps-l@lists.**wikimedia.org
> > <Xmldatadumps-l@lists.wikimedia.org>
> > https://lists.wikimedia.org/**mailman/listinfo/xmldatadumps-**l<https:
> > //lists.wikimedia.org/mailman/listinfo/xmldatadumps-l>
> >
> _______________________________________________
> Wikitech-l mailing list
> wikitec...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
> _______________________________________________
> Wikitech-l mailing list
> wikitec...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

Re: [Xmldatadumps-l] [Wikitech-l] Suggested file format of new incremental dumps

Reply via email to