Re: [Wikitech-l] Importing Wikipedia XML Dumps into MediaWiki

O. O. Sun, 08 Mar 2009 13:44:40 -0700

Felipe Ortega wrote:
> 
> 
> --- El dom, 8/3/09, O. O. <olson...@yahoo.com> escribió:
> 
>>     I thought that the
>> pages-articles.xml.bz2 (i.e. the XML Dump) contains 
>> the templates – but I did not find a way to do install it
>> separately.
>>
> 
> No, it only contains a dump of the current version of each article (involving 
> the page, revision and text tables in the DB).


Thanks Felipe for posting.

pages-articles.xml.bz2 as mentioned at 
http://download.wikimedia.org/enwiki/20081008/ Says that it is 
“Articles, templates, image descriptions, and primary meta-pages.” What 
does “templates” mean if it does not contain the templates??

> 
>> Another thing I noticed (with the Portuguese Wiki which is
>> a much 
>> smaller dump than the English Wiki) is that the number of
>> pages imported 
>> by importDump.php and MWDumper differ i.e. importDump.php
>> had much more 
>> pages than MWDumper. That is way I would have preferred to
>> do this using 
>>   importDump.php.
>>
> 
> On download.wikimedia.org/your_lang_here you can check how many pages were 
> supposed to be included in each dump.
> 
> You also have other parsers you may want to check (in my experience, my 
> parser was slightly faster than mwdumper):
> http://meta.wikimedia.org/wiki/WikiXRay_Python_parser

Here my concern is not about speed – but about integrity. I don’t mind 
the import taking too long – as long as it completes. I used 
importDump.php because it was listed as the “Recommended way” of 
importing. But now I realize that no one has used it on a real Wikipedia 
Dump.

Nonetheless, I would give your tool a try sometime over the next two 
weeks or so.


> 
>> Also in a previous post, you mentioned about taking care
>> about the 
>> “secondary link tables”. How do I do that? Does
>> “secondary links” refer 
>> to language links, external links, template links, image
>> links, category 
>> links, page links or something else?
>>
> 
> On the same page for downloads you have a list of additional dumps in SQL 
> format (then compressed with gzip). I guess you may also want to import them 
> (but of course, you don't need a parser for them, they can be directly loaded 
> in the DB).
> 
> Best,
> 
> F.
> 

I have not tried these as yet. I would try them tomorrow and get back to 
you i.e. the newsgroup.

Thanks again,
O. O.


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Importing Wikipedia XML Dumps into MediaWiki

Reply via email to