On Thu, Feb 4, 2010 at 9:12 PM, Eric Sun <e...@cs.stanford.edu> wrote:
> Hi,
>
> I saw this thread back in October where someone was having trouble
> importing the English Wikipedia XML dump:
> http://lists.wikimedia.org/pipermail/wikitech-l/2009-October/045594.html
> The thread back in October seemed to end without resolution, and the
> tools still seem to be broken, so has anyone found a solution in the
> meantime?
>
> I'm using mediawiki-1.15.1 and attempting to import
> enwiki-20100130-pages-articles.xml.bz2.
>
> None of these options seem to work:
> 1) importDump.php
> fails by spewing "Warning: xml_parse(): Unable to call handler in_()
> in ./includes/Import.php on line 437" repeatedly
>
> 2) xml2sql (http://meta.wikimedia.org/wiki/Xml2sql):
> Fails with error:
> xml2sql: parsing aborted at line 33 pos 16.
> due to the new <redirect> tag introduced in the new dumps?
>
> 3) mwdumper (http://www.mediawiki.org/wiki/MWDumper):
> Current XML is schema v0.4, but the documentation says that it's for 0.3
>
> 4) mwimport (http://meta.wikimedia.org/wiki/Data_dumps/mwimport):
> Fails immediately:
> siteinfo: untested generator 'MediaWiki 1.16alpha-wmf', expect trouble ahead
> page: expected closing tag in line 35
>
> Any tips?
> Thanks!
> Eric
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

Most of these errors are caused by the new(ish) <redirect /> tag
within <page> elements. 0.4 is the correct version of the schema,
but unfortunately the schema was updated and dumps were
produced using them before the changes made it into a release.

1.15.1 cannot import pages with <redirect />, we should probably
backport that. That, and we should rewrite the importers to not barf
terribly when they encounter an unknown element.

-Chad

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to