[Wikitech-l] Crawling deWP

2009-01-27 Thread Marco Schuster
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all, I want to crawl around 800.000 flagged revisions from the German Wikipedia, in order to make a dump containing only flagged revisions. For this, I obviously need to spider Wikipedia. What are the limits (rate!) here, what UA should I use and w

Re: [Wikitech-l] Crawling deWP

2009-01-27 Thread Rolf Lampa
Marco Schuster skrev: > I want to crawl around 800.000 flagged revisions from the German > Wikipedia, in order to make a dump containing only flagged revisions. [...] > flaggedpages where fp_reviewed=1;". Is it correct this one gives me a > list of all articles with flagged revs, Doesn't the xml

Re: [Wikitech-l] Crawling deWP

2009-01-27 Thread Daniel Kinzler
Rolf Lampa schrieb: > Marco Schuster skrev: >> I want to crawl around 800.000 flagged revisions from the German >> Wikipedia, in order to make a dump containing only flagged revisions. > [...] >> flaggedpages where fp_reviewed=1;". Is it correct this one gives me a >> list of all articles with flag

Re: [Wikitech-l] Crawling deWP

2009-01-27 Thread Platonides
Marco Schuster wrote: > Hi all, > > I want to crawl around 800.000 flagged revisions from the German > Wikipedia, in order to make a dump containing only flagged revisions. > For this, I obviously need to spider Wikipedia. > What are the limits (rate!) here, what UA should I use and what > caveats

Re: [Wikitech-l] Crawling deWP

2009-01-27 Thread Marco Schuster
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, Jan 28, 2009 at 12:49 AM, Rolf Lampa wrote: > Marco Schuster skrev: >> I want to crawl around 800.000 flagged revisions from the German >> Wikipedia, in order to make a dump containing only flagged revisions. > [...] >> flaggedpages where fp_r

Re: [Wikitech-l] Crawling deWP

2009-01-27 Thread Marco Schuster
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, Jan 28, 2009 at 12:53 AM, Platonides wrote: > Marco Schuster wrote: >> Hi all, >> >> I want to crawl around 800.000 flagged revisions from the German >> Wikipedia, in order to make a dump containing only flagged revisions. >> For this, I obvio

Re: [Wikitech-l] Crawling deWP

2009-01-28 Thread Rolf Lampa
Marco Schuster skrev: > Rolf Lampa wrote: >> >> Doesn't the xml dumps contain the flag for flagged revs? > > The xml dumps are nothing for me, way too much overhead (especially, > they are old, and I want to use single files, it's easier to process > these than one hge xml file). And they do

Re: [Wikitech-l] Crawling deWP

2009-01-28 Thread Daniel Kinzler
Rolf Lampa schrieb: > I'd love, however, to see the flagged rev status as an attribute in one > of the tags, for example > > Regards, Naw, it's more complex than that. You can have any number of different flags. It would probably have to be foobar -- daniel __

Re: [Wikitech-l] Crawling deWP

2009-01-28 Thread Platonides
Daniel Kinzler wrote: > Rolf Lampa schrieb: >> I'd love, however, to see the flagged rev status as an attribute in one >> of the tags, for example >> >> Regards, > > Naw, it's more complex than that. You can have any number of different flags. > It > would probably have to be > foobar > >

Re: [Wikitech-l] Crawling deWP

2009-01-28 Thread Thomas Dalton
2009/1/28 Platonides : > Daniel Kinzler wrote: >> Rolf Lampa schrieb: >>> I'd love, however, to see the flagged rev status as an attribute in one >>> of the tags, for example >>> >>> Regards, >> >> Naw, it's more complex than that. You can have any number of different >> flags. It >> would probab