Re: [Wikitech-l] Importing English Wikipeida XML Dumps into MediaWiki
Andrew Krizhanovsky wrote: Hi! I have tried xml2sql and importDump.php. The same error. Best regards, Andrew. Thanks Bilal and Andrew. O.O. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Importing English Wikipeida XML Dumps into MediaWiki
Hi! I have tried xml2sql and importDump.php. The same error. Best regards, Andrew. On Fri, Oct 9, 2009 at 10:18 PM, O. O. olson...@yahoo.com wrote: Andrew Krizhanovsky wrote: Hi! I have got the same redirect problem while importing the dump of Russian Wiktionary. :( Best regards, Andrew Krizhanovsky. So Andrew, do you import using importDump.php, MWDumper or xml2sql? I am curious to know what others are using for their imports. (This is for my personal knowledge.) It seems that the “redirect /” tags are mostly blank while grepping through the English Wikipedia Dump. I hope someone can fix this soon. Thanks to you guys, O. O. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Importing English Wikipeida XML Dumps into MediaWiki
Hi! I have got the same redirect problem while importing the dump of Russian Wiktionary. :( Best regards, Andrew Krizhanovsky. On Fri, Oct 9, 2009 at 3:46 AM, O. O. olson...@yahoo.com wrote: Tomasz Finc wrote: O. O. wrote: If it's failing due to an old xsd then .. The updated xsd and copy of Import.php just got checked into our repositories so you can either pull this http://svn.wikimedia.org/viewvc/mediawiki?view=revrevision=54472 and increase the version number ala http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/Export.php?r1=56298r2=56612 Or you can wait till the next tagged release which will likely include this. Thanks Tomasz. I don’t mind waiting for your next release if it is going to be in the next month or so. (Thanks for the pointer to the source of MW Dumper. The Source is not mentioned in the Readme. However, I found it too complicated - or not well documented for me at this point.) I'll have a peek at this and see if it can be improved. I hope someone could updated MW Dumper to the new XSD – it would help a lot as far as importing Wikipedia Dumps are concerned, because importDump.php is not practical. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Importing English Wikipeida XML Dumps into MediaWiki
Andrew Krizhanovsky wrote: Hi! I have got the same redirect problem while importing the dump of Russian Wiktionary. :( Best regards, Andrew Krizhanovsky. So Andrew, do you import using importDump.php, MWDumper or xml2sql? I am curious to know what others are using for their imports. (This is for my personal knowledge.) It seems that the “redirect /” tags are mostly blank while grepping through the English Wikipedia Dump. I hope someone can fix this soon. Thanks to you guys, O. O. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Importing English Wikipeida XML Dumps into MediaWiki
I have used xml2sql, mwdumper, import.php and the python script to import The two fastest are xml2sql and the python script (xray). The best results is from importDump.php mwDumper is slow but it gives good results. I have not done any import with the new redirect tag. bilal On Fri, Oct 9, 2009 at 2:18 PM, O. O. olson...@yahoo.com wrote: Andrew Krizhanovsky wrote: Hi! I have got the same redirect problem while importing the dump of Russian Wiktionary. :( Best regards, Andrew Krizhanovsky. So Andrew, do you import using importDump.php, MWDumper or xml2sql? I am curious to know what others are using for their imports. (This is for my personal knowledge.) It seems that the “redirect /” tags are mostly blank while grepping through the English Wikipedia Dump. I hope someone can fix this soon. Thanks to you guys, O. O. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Verily, with hardship comes ease. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Importing English Wikipeida XML Dumps into MediaWiki
Platonides wrote: Seems it fails on the new redirect tag. P.S. http://download.wikimedia.org/tools/ does not have the source of MWDumper. I thought this was open source? MWDumper source is available at http://svn.wikimedia.org/viewvc/mediawiki/trunk/mwdumper/ It should be noted at the readme. Thanks Platonides. With the new redirect tag is there anyway to import the new XML Files? Could I simply strip out the redirect / tags from the file, if I wanted MWDumper to work. Or if I upgrade to MediaWiki 1.16, would import.php work without any problems? (Thanks for the pointer to the source of MW Dumper. The Source is not mentioned in the Readme. However, I found it too complicated - or not well documented for me at this point.) Thanks again, O.O. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Importing English Wikipeida XML Dumps into MediaWiki
O. O. wrote: Platonides wrote: Seems it fails on the new redirect tag. P.S. http://download.wikimedia.org/tools/ does not have the source of MWDumper. I thought this was open source? MWDumper source is available at http://svn.wikimedia.org/viewvc/mediawiki/trunk/mwdumper/ It should be noted at the readme. Thanks Platonides. With the new redirect tag is there anyway to import the new XML Files? Could I simply strip out the redirect / tags from the file, if I wanted MWDumper to work. Or if I upgrade to MediaWiki 1.16, would import.php work without any problems? If it's failing due to an old xsd then .. The updated xsd and copy of Import.php just got checked into our repositories so you can either pull this http://svn.wikimedia.org/viewvc/mediawiki?view=revrevision=54472 and increase the version number ala http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/Export.php?r1=56298r2=56612 Or you can wait till the next tagged release which will likely include this. (Thanks for the pointer to the source of MW Dumper. The Source is not mentioned in the Readme. However, I found it too complicated - or not well documented for me at this point.) I'll have a peek at this and see if it can be improved. --tomasz ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Importing English Wikipeida XML Dumps into MediaWiki
Tomasz Finc wrote: O. O. wrote: If it's failing due to an old xsd then .. The updated xsd and copy of Import.php just got checked into our repositories so you can either pull this http://svn.wikimedia.org/viewvc/mediawiki?view=revrevision=54472 and increase the version number ala http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/Export.php?r1=56298r2=56612 Or you can wait till the next tagged release which will likely include this. Thanks Tomasz. I don’t mind waiting for your next release if it is going to be in the next month or so. (Thanks for the pointer to the source of MW Dumper. The Source is not mentioned in the Readme. However, I found it too complicated - or not well documented for me at this point.) I'll have a peek at this and see if it can be improved. I hope someone could updated MW Dumper to the new XSD – it would help a lot as far as importing Wikipedia Dumps are concerned, because importDump.php is not practical. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Importing English Wikipeida XML Dumps into MediaWiki
Hi, I have been importing the English Wikipeida XML Dumps every few months (last time I did this was in June). I then used xml2sql and it always worked for me. Now I attempted the import on the latest dump enwiki-20090920-pages-articles.xml (and on the dump from enwiki-20090810-pages-articles.xml), both of these have the error: $ xml2sql enwiki-20090920-pages-articles.xml unexpected element redirect xml2sql: parsing aborted at line 33 pos 16. So then I try mwdumper and after 1.4 M Pages, it craps out: …… 1,423,000 pages (957.283/sec), 1,423,000 revs (957.283/sec) 1,424,000 pages (957.465/sec), 1,424,000 revs (957.465/sec) Exception in thread main java.lang.IllegalArgumentException: Invalid contributor at org.mediawiki.importer.XmlDumpReader.closeContributor(Unknown Source) at org.mediawiki.importer.XmlDumpReader.endElement(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) at org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at javax.xml.parsers.SAXParser.parse(SAXParser.java:395) at javax.xml.parsers.SAXParser.parse(SAXParser.java:198) at org.mediawiki.importer.XmlDumpReader.readDump(Unknown Source) at org.mediawiki.dumper.Dumper.main(Unknown Source) I tried the importDump.php and I get errors of the kind (MediaWiki 1.14.0) … Warning: xml_parse(): Unable to call handler in_() in /var/www/includes/Import.php on line 437 Warning: xml_parse(): Unable to call handler in_() in /var/www/includes/Import.php on line 437 Warning: xml_parse(): Unable to call handler out_() in /var/www/includes/Import.php on line 437 …. (Sorry I don’t know where this error starts, but it processes a few thousand pages, up till I get sick of looking at it before failing.) Any ideas if the format of the XML files have changed because I can swear that as of June or may be May, I had xml2sql working. I know that I might need to upgrade MediaWiki to 1.15, however importDump.php usually does not work for the English Wikipedia anyways. I would be grateful if someone has any ideas? Thanks guys, O. O. P.S. http://download.wikimedia.org/tools/ does not have the source of MWDumper. I thought this was open source? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Importing English Wikipeida XML Dumps into MediaWiki
O. O. writes: (Sorry I don’t know where this error starts, but it processes a few thousand pages, up till I get sick of looking at it before failing.) Any ideas if the format of the XML files have changed because I can swear that as of June or may be May, I had xml2sql working. I know that I might need to upgrade MediaWiki to 1.15, however importDump.php usually does not work for the English Wikipedia anyways. I would be grateful if someone has any ideas? Thanks guys, O. O. Seems it fails on the new redirect tag. P.S. http://download.wikimedia.org/tools/ does not have the source of MWDumper. I thought this was open source? MWDumper source is available at http://svn.wikimedia.org/viewvc/mediawiki/trunk/mwdumper/ It should be noted at the readme. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l