Marco Schuster wrote:
> Hi all,
> 
> I want to crawl around 800.000 flagged revisions from the German
> Wikipedia, in order to make a dump containing only flagged revisions.
> For this, I obviously need to spider Wikipedia.
> What are the limits (rate!) here, what UA should I use and what
> caveats do I have to take care of?
> 
> Thanks,
> Marco
> 
> PS: I already have a revisions list, created with the Toolserver. I
> used the following query: "select fp_stable,fp_page_id from
> flaggedpages where fp_reviewed=1;". Is it correct this one gives me a
> list of all articles with flagged revs, fp_stable being the revid of
> the most current flagged rev for this article?

Fetch them from the toolserver (there's a tool by duesentrieb for that).
It will catch almost all of them from the toolserver cluster, and make a
request to wikipedia only if needed.


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to