I hope at least a couple of you are subscribed to the xmldatadumps-l
list to keep track of developments with the dumps.  Last month I started
running a sort of poor-person's incremental, very experimental at this
point, but perhaps that will turn out to be useful for folks who are
just looking to parse through the latest content.

Ariel

Στις 11-12-2011, ημέρα Κυρ, και ώρα 10:45 +0100, ο/η Stefan Kühn έγραψε:
> Am 10.12.2011 20:52, schrieb Jeremy Baron:
> > Is it sufficient to receive the XML on stdin or do you need to be able to 
> > seek?
> >
> > It is trivial to give you XML on stdin e.g.
> > $<  path/to/bz2 bzip2 -d | perl script.pl
> 
> Hmm, the stdin is possible, but I think this will need many memory of 
> RAM on the server. I think this is no option for the future. Every 
> language grows every day and the dumps will also grow. The next problem 
> is the parallel use of a compressed file. If more user use this 
> compressed file like your idea, then bzip2 will crash the server IMHO.
> 
> I think it is no problem to store the uncompressed XML files for an easy 
> usage. We should make rules, where they have to stay and how long or we 
> need a list, where every user can say "I need only the two newest dumps 
> of enwiki, dewiki,...". If a dump is not needed, then we can delete this 
> file.
> 
> Stefan (sk)
> 
> 
> _______________________________________________
> Toolserver-l mailing list ([email protected])
> https://lists.wikimedia.org/mailman/listinfo/toolserver-l
> Posting guidelines for this list: 
> https://wiki.toolserver.org/view/Mailing_list_etiquette



_______________________________________________
Toolserver-l mailing list ([email protected])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Reply via email to