[Wikitech-l] mwdumper ERROR Duplicate entry

Dawson Thu, 15 Jan 2009 04:19:47 -0800

Hello,

I have used Special:Export at en.wikipedia to export  
"Diabetes_mellitus" and ticked the box "include templates" (I'm only  
really after the templates).


The resulting XML file is 40.1mb so I decided to go with mwdumper.js  
rather than Special:Import.

I'm working on a fresh build of mediawiki on my local system. When  
running the command:

java -jar mwdumper.jar --format=sql:1.5 Wikipedia-20090113203939.xml |  
mysql -u root -p wiki

It is returning the following error:

1 pages (0.102/sec), 1,000 revs (102.062/sec)
ERROR 1062 (23000) at line 99: Duplicate entry '45970' for key 1
Exception in thread "main" java.io.IOException: XML document  
structures must start and end within the same entity.
        at org.mediawiki.importer.XmlDumpReader.readDump(Unknown Source)
        at org.mediawiki.dumper.Dumper.main(Unknown Source)
Caused by: org.xml.sax.SAXParseException: XML document structures must  
start and end within the same entity.
        at  
org 
.apache 
.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
        at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown  
Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
        at  
org 
.apache.xerces.impl.XMLDocumentFragmentScannerImpl.endEntity(Unknown  
Source)
        at org.apache.xerces.impl.XMLDocumentScannerImpl.endEntity(Unknown  
Source)
        at org.apache.xerces.impl.XMLEntityManager.endEntity(Unknown Source)
        at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
        at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source)
        at  
org 
.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown  
Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl 
$FragmentContentDispatcher.dispatch(Unknown Source)
        at  
org 
.apache 
.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown  
Source)
        at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
        at javax.xml.parsers.SAXParser.parse(SAXParser.java:176)
        ... 2 more

Can anyone please advise? After some googling the only advice I  
managed to find was:

"Before you start, try clearing the tables that mwdumper works in:

DELETE FROM page; DELETE FROM revision; DELETE FROM text; "

I have done this and tried again, but the same error continues.

Many thanks, Dawson


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] mwdumper ERROR Duplicate entry

Reply via email to