Marco Schuster wrote: > Hi all, > > I want to crawl around 800.000 flagged revisions from the German > Wikipedia, in order to make a dump containing only flagged revisions. > For this, I obviously need to spider Wikipedia. > What are the limits (rate!) here, what UA should I use and what > caveats do I have to take care of? > > Thanks, > Marco > > PS: I already have a revisions list, created with the Toolserver. I > used the following query: "select fp_stable,fp_page_id from > flaggedpages where fp_reviewed=1;". Is it correct this one gives me a > list of all articles with flagged revs, fp_stable being the revid of > the most current flagged rev for this article?
Fetch them from the toolserver (there's a tool by duesentrieb for that). It will catch almost all of them from the toolserver cluster, and make a request to wikipedia only if needed. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l