Yes this is the expected behaviour asXMLString() as the name suggests builds a string which requires lots of memory for large results
Use the outputAsXML() methods which take an OutputStream if you want to print results in a memory efficient streaming fashion. I'll update the javadoc for the relevant methods to make the distinction clear Rob On 05/08/2014 01:27, "Juan Sequeda" <[email protected]> wrote: >Hi, > >I'm building a ResultSet from a QueryIterator: > >List<String> varNames = ... >QueryIterator queryIterator = ... >ResultSet sparqlResultSet = ResultSetFactory.create(queryIterator, >varNames); >String xmlResultString = ResultSetFormatter.asXMLString(sparqlResultSet); > >When the query returns more than 70,000 rows, I get the following OOM >(I've >haven't changed the default java heap size): > >java.lang.OutOfMemoryError: Java heap space >at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:133) >at java.lang.StringCoding.decode(StringCoding.java:173) > at java.lang.String.<init>(String.java:443) >at java.lang.String.<init>(String.java:515) >at >com.hp.hpl.jena.sparql.resultset.OutputBase.asString(OutputBase.java:35) > at >com.hp.hpl.jena.query.ResultSetFormatter.asXMLString(ResultSetFormatter.ja >va:548) >at >com.hp.hpl.jena.query.ResultSetFormatter.asXMLString(ResultSetFormatter.ja >va:535) > >Line 35 of OutputBase is the following: > >try { return new String(arr.toByteArray(), "UTF-8") ; } > >So it seems that ResultSet has been iterated through ( apply() in >ResultSetApply) and the problem is after that. >Is it fair to assume that because arr.toByteArray() is making another >copy, >the memory duplicated, and that is why I'm getting an OOM. > >However, if the query returns 200,000 rows, the new OOM error is the >following: > >java.lang.OutOfMemoryError: Java heap space >at java.util.Arrays.copyOf(Arrays.java:2786) >at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) > at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202) >at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:263) > at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:106) >at java.io.OutputStreamWriter.write(OutputStreamWriter.java:190) > at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:111) >at java.io.BufferedWriter.write(BufferedWriter.java:125) > at org.openjena.atlas.io.IndentedWriter.write(IndentedWriter.java:140) >at >org.openjena.atlas.io.IndentedWriter.printOneChar(IndentedWriter.java:135) > at org.openjena.atlas.io.IndentedWriter.print(IndentedWriter.java:99) >at >com.hp.hpl.jena.sparql.resultset.XMLOutputResultSet.printLiteral(XMLOutput >ResultSet.java:232) > at >com.hp.hpl.jena.sparql.resultset.XMLOutputResultSet.printBindingValue(XMLO >utputResultSet.java:189) >at >com.hp.hpl.jena.sparql.resultset.XMLOutputResultSet.binding(XMLOutputResul >tSet.java:169) > at >com.hp.hpl.jena.sparql.resultset.ResultSetApply.apply(ResultSetApply.java: >49) >at com.hp.hpl.jena.sparql.resultset.XMLOutput.format(XMLOutput.java:52) > at >com.hp.hpl.jena.sparql.resultset.OutputBase.asString(OutputBase.java:34) >at >com.hp.hpl.jena.query.ResultSetFormatter.asXMLString(ResultSetFormatter.ja >va:548) > at >com.hp.hpl.jena.query.ResultSetFormatter.asXMLString(ResultSetFormatter.ja >va:535) > >In this case, we ran out of memory while iterating through the ResultSet. > >An obvious thing to do is to increment the java heap size. But the issue >is >that the queries I'm running will return over a million rows. For such >queries, I've increased the heap size to 4gb, and I'm still getting an OOM >error. > >Is there something I'm doing wrong? Or is the solution to increase even >more the heap space? > >Thanks for your pointers! > > >Juan Sequeda >+1-575-SEQ-UEDA >www.juansequeda.com
