Re: [Wikitech-l] [Toolserver-l] Crawling deWP

2009-01-28 Thread Daniel Kinzler
Marco Schuster schrieb: ... But by then, i do hope we have revision flags in the dumps. because that would be The Right Thing to use. Still, using the dumps would require me to get the full history dump because I only want flagged revisions and not current revisions without the flag.

Re: [Wikitech-l] [Toolserver-l] Crawling deWP

2009-01-28 Thread Thomas Dalton
2009/1/28 Daniel Kinzler dan...@brightbyte.de: Marco Schuster schrieb: ... But by then, i do hope we have revision flags in the dumps. because that would be The Right Thing to use. Still, using the dumps would require me to get the full history dump because I only want flagged revisions

Re: [Wikitech-l] [Toolserver-l] Crawling deWP

2009-01-27 Thread Daniel Kinzler
Marco Schuster schrieb: Fetch them from the toolserver (there's a tool by duesentrieb for that). It will catch almost all of them from the toolserver cluster, and make a request to wikipedia only if needed. I highly doubt this is legal use for the toolserver, and I pretty much guess that 800k

Re: [Wikitech-l] [Toolserver-l] Crawling deWP

2009-01-27 Thread Marco Schuster
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, Jan 28, 2009 at 1:13 AM, Daniel Kinzler wrote: Marco Schuster schrieb: Fetch them from the toolserver (there's a tool by duesentrieb for that). It will catch almost all of them from the toolserver cluster, and make a request to wikipedia