Felipe Ortega wrote: > > > --- El dom, 8/3/09, O. O. <olson...@yahoo.com> escribió: > >> I thought that the >> pages-articles.xml.bz2 (i.e. the XML Dump) contains >> the templates – but I did not find a way to do install it >> separately. >> > > No, it only contains a dump of the current version of each article (involving > the page, revision and text tables in the DB).
Thanks Felipe for posting. pages-articles.xml.bz2 as mentioned at http://download.wikimedia.org/enwiki/20081008/ Says that it is “Articles, templates, image descriptions, and primary meta-pages.” What does “templates” mean if it does not contain the templates?? > >> Another thing I noticed (with the Portuguese Wiki which is >> a much >> smaller dump than the English Wiki) is that the number of >> pages imported >> by importDump.php and MWDumper differ i.e. importDump.php >> had much more >> pages than MWDumper. That is way I would have preferred to >> do this using >> importDump.php. >> > > On download.wikimedia.org/your_lang_here you can check how many pages were > supposed to be included in each dump. > > You also have other parsers you may want to check (in my experience, my > parser was slightly faster than mwdumper): > http://meta.wikimedia.org/wiki/WikiXRay_Python_parser Here my concern is not about speed – but about integrity. I don’t mind the import taking too long – as long as it completes. I used importDump.php because it was listed as the “Recommended way” of importing. But now I realize that no one has used it on a real Wikipedia Dump. Nonetheless, I would give your tool a try sometime over the next two weeks or so. > >> Also in a previous post, you mentioned about taking care >> about the >> “secondary link tables”. How do I do that? Does >> “secondary links” refer >> to language links, external links, template links, image >> links, category >> links, page links or something else? >> > > On the same page for downloads you have a list of additional dumps in SQL > format (then compressed with gzip). I guess you may also want to import them > (but of course, you don't need a parser for them, they can be directly loaded > in the DB). > > Best, > > F. > I have not tried these as yet. I would try them tomorrow and get back to you i.e. the newsgroup. Thanks again, O. O. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l