Re: [Wikitech-l] API vs data dumps

2010-11-07 Thread Andrew Dunbar
On 14 October 2010 09:37, Alex Brollo alex.bro...@gmail.com wrote: 2010/10/13 Paul Houle p...@ontology2.com     Don't be intimidated by working with the data dumps.  If you've got an XML API that does streaming processing (I used .NET's XmlReader) and use the old unix trick of piping the

Re: [Wikitech-l] API vs data dumps

2010-11-07 Thread Alex Brollo
2010/11/7 Andrew Dunbar hippytr...@gmail.com On 14 October 2010 09:37, Alex Brollo alex.bro...@gmail.com wrote: Hi Alex. I have been doing something similar in Perl for a few years for the English Wiktionary. I've never been sure on the best way to store all the index files I create

[Wikitech-l] API vs data dumps

2010-10-13 Thread Paul Houle
I know there's some discussion about what's appropriate for the Wikipedia API, and I'd just like to share my recent experience. I was trying to download the Wikipedia entries for people, of which I found about 800,000. I had a scanner already written that could do the download,

Re: [Wikitech-l] API vs data dumps

2010-10-13 Thread Alex Brollo
2010/10/13 Paul Houle p...@ontology2.com Don't be intimidated by working with the data dumps. If you've got an XML API that does streaming processing (I used .NET's XmlReader) and use the old unix trick of piping the output of bunzip2 into your program, it's really pretty easy. When