I'm primarily a Windows guy, so I'm trying to write the code in a portable way and I will make sure the application works on both Linux and Windows.
Petr Onderka On Wed, Jul 3, 2013 at 4:49 PM, Erik Zachte <ezac...@wikimedia.org> wrote: > > it will now be a command line application that outputs the data as > uncompressed XML, in the same format as current dumps. > > That will help a great deal. But I assume your application will be for > Linux only? > So it would help to still generate current compressed dumps, as post > processing step, and store them online for download. > > One of the reasons of xml dumps is platform independence, both from > producer side (we had ever evolving SQL dumps earlier), and consumer side > (not everyone uses Linux). > > Erik Zachte > > -----Original Message----- > From: wikitech-l-boun...@lists.wikimedia.org [mailto: > wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Petr Onderka > Sent: Wednesday, July 03, 2013 4:04 PM > To: Wikimedia developers; Wikipedia Xmldatadumps-l > Subject: Re: [Wikitech-l] [Xmldatadumps-l] Suggested file format of new > incremental dumps > > A reply to all those who basically want to keep the current XML dumps: > > I have decided to change the primary way of reading the dumps: it will now > be a command line application that outputs the data as uncompressed XML, in > the same format as current dumps. > > This way, you should be able to use the new dumps with minimal changes to > your code. > > Keeping the dumps in a text-based format doesn't make sense, because that > can't be updated efficiently, which is the whole reason for the new dumps. > > Petr Onderka > > > On Mon, Jul 1, 2013 at 11:10 PM, Byrial Jensen <byr...@vip.cybercity.dk > >wrote: > > > Hi, > > > > As a regular of user of dump files I would not want a "fancy" file > > format with indexes stored as trees etc. > > > > I parse all the dump files (both for SQL tables and the XML files) > > with a one pass parser which inserts the data I want (which sometimes > > is only a small fraction of the total amount of data in the file) into > > my local database. I will normally never store uncompressed dump > > files, but pipe the uncompressed data directly from bunzip or gunzip > > to my parser to save disk space. Therefore it is important to me that > > the format is simple enough for a one pass parser. > > > > I cannot really imagine who would use a library with object oriented > > API to read dump files. No matter what it would be inefficient and > > have fewer features and possibilities than using a real database. > > > > I could live with a binary format, but I have doubts if it is a good > idea. > > It will be harder to take sure that your parser is working correctly, > > and you have to consider things like endianness, size of integers, > > format of floats etc. which give no problems in text formats. The > > binary files may be smaller uncompressed (which I don't store anyway) > > but not necessary when compressed, as the compression will do better on > text files. > > > > Regards, > > - Byrial > > > > > > ______________________________**_________________ > > Xmldatadumps-l mailing list > > Xmldatadumps-l@lists.**wikimedia.org > > <Xmldatadumps-l@lists.wikimedia.org> > > https://lists.wikimedia.org/**mailman/listinfo/xmldatadumps-**l<https: > > //lists.wikimedia.org/mailman/listinfo/xmldatadumps-l> > > > _______________________________________________ > Wikitech-l mailing list > wikitec...@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > > _______________________________________________ > Wikitech-l mailing list > wikitec...@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l >
_______________________________________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l