https://bugzilla.wikimedia.org/show_bug.cgi?id=21937

           Summary: mwdumper uses too much memory
           Product: mwdumper
           Version: unspecified
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: enhancement
          Priority: Normal
         Component: general
        AssignedTo: br...@pobox.com
        ReportedBy: gti...@gmail.com


I tried to run the GUI version of the newest revision (r60229) of mwdumper
under Java 6 update 17 on an Intel Core i7 with 3,25G RAM and WinXP SP3, and it
gave this error:

Exception in thread "Thread-8" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.StringCoding.safeTrim(Unknown Source)
at java.lang.StringCoding.access$300(Unknown Source)
at java.lang.StringCoding$StringEncoder.encode(Unknown Source)
at java.lang.StringCoding.encode(Unknown Source)
at java.lang.String.getBytes(Unknown Source)
at com.mysql.jdbc.StringUtils.getBytes(StringUtils.java:493)
at com.mysql.jdbc.StringUtils.getBytes(StringUtils.java:603)
at com.mysql.jdbc.ByteArrayBuffer.writeStringNoNull(ByteArrayBuffer.java:544)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1638)
at com.mysql.jdbc.Connection.execSQL(Connection.java:2972)
at com.mysql.jdbc.Connection.execSQL(Connection.java:2902)
at com.mysql.jdbc.Statement.execute(Statement.java:529)
at
org.mediawiki.importer.SqlServerStream.writeStatement(SqlServerStream.java:25)
at org.mediawiki.importer.SqlWriter.flushInsertBuffer(SqlWriter.java:195)
at org.mediawiki.importer.SqlWriter.bufferInsertRow(SqlWriter.java:184)
at org.mediawiki.importer.SqlWriter15.writeRevision(SqlWriter15.java:68)
at org.mediawiki.importer.PageFilter.writeRevision(PageFilter.java:67)
at org.mediawiki.dumper.ProgressFilter.writeRevision(ProgressFilter.java:56)
at org.mediawiki.importer.XmlDumpReader.closeRevision(XmlDumpReader.java:346)
at org.mediawiki.importer.XmlDumpReader.endElement(XmlDumpReader.java:204)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)

According to the Java docs, default max heap size is 3/4 of the physical
memory, that is, around 800M. Since a single revision is at most 2M, there is
no reason for mwdumper to require that much space. (It ran on the huwiki full
history dump, directly writing to the database.)


-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to