Yes, manually matching is fairly simple but in the worst case you need to iterate over n-1 talk pages (where n is the total number of talk pages of a Wikipedia) to find the talk page that belongs to a user page when using the dump files. Hence, if the dump file would contain for each article a tag with talk page id then it would significantly reduce the processing time. Diederik
On Sat, Jan 8, 2011 at 11:39 AM, Bryan Tong Minh <bryan.tongm...@gmail.com> wrote: > On Sat, Jan 8, 2011 at 5:32 PM, John <phoenixoverr...@gmail.com> wrote: >> its just a matter of matching page titles, if there is a page in namespace 0 >> and a page in namespace (article and article talk) with the same title they >> go together. its fairly simple >> > To expand John's comment, the talk page is always the page with the > same title, but with a namespace number 1 higher. > > > Bryan > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > -- <a href="http://about.me/diederik">Check out my about.me profile!</a> _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l