On 14 October 2010 09:37, Alex Brollo alex.bro...@gmail.com wrote:
2010/10/13 Paul Houle p...@ontology2.com
Don't be intimidated by working with the data dumps. If you've got
an XML API that does streaming processing (I used .NET's XmlReader) and
use the old unix trick of piping the
2010/11/7 Andrew Dunbar hippytr...@gmail.com
On 14 October 2010 09:37, Alex Brollo alex.bro...@gmail.com wrote:
Hi Alex. I have been doing something similar in Perl for a few years
for the English
Wiktionary. I've never been sure on the best way to store all the
index files I create
I know there's some discussion about what's appropriate for the
Wikipedia API, and I'd just like to share my recent experience.
I was trying to download the Wikipedia entries for people, of
which I found about 800,000. I had a scanner already written that
could do the download,
2010/10/13 Paul Houle p...@ontology2.com
Don't be intimidated by working with the data dumps. If you've got
an XML API that does streaming processing (I used .NET's XmlReader) and
use the old unix trick of piping the output of bunzip2 into your
program, it's really pretty easy.
When