Re: OOM with ResultSetFormatter.asXMLString

Rob Vesse Tue, 05 Aug 2014 01:08:29 -0700

Yes this is the expected behaviour

asXMLString() as the name suggests builds a string which requires lots of
memory for large results


Use the outputAsXML() methods which take an OutputStream if you want to
print results in a memory efficient streaming fashion.  I'll update the
javadoc for the relevant methods to make the distinction clear

Rob

On 05/08/2014 01:27, "Juan Sequeda" <[email protected]> wrote:

>Hi,
>
>I'm building a ResultSet from a QueryIterator:
>
>List<String> varNames = ...
>QueryIterator queryIterator = ...
>ResultSet sparqlResultSet = ResultSetFactory.create(queryIterator,
>varNames);
>String xmlResultString = ResultSetFormatter.asXMLString(sparqlResultSet);
>
>When the query returns more than 70,000 rows, I get the following OOM
>(I've
>haven't changed the default java heap size):
>
>java.lang.OutOfMemoryError: Java heap space
>at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:133)
>at java.lang.StringCoding.decode(StringCoding.java:173)
> at java.lang.String.<init>(String.java:443)
>at java.lang.String.<init>(String.java:515)
>at 
>com.hp.hpl.jena.sparql.resultset.OutputBase.asString(OutputBase.java:35)
> at
>com.hp.hpl.jena.query.ResultSetFormatter.asXMLString(ResultSetFormatter.ja
>va:548)
>at
>com.hp.hpl.jena.query.ResultSetFormatter.asXMLString(ResultSetFormatter.ja
>va:535)
>
>Line 35 of OutputBase is the following:
>
>try { return new String(arr.toByteArray(), "UTF-8") ; }
>
>So it seems that ResultSet has been iterated through ( apply() in
>ResultSetApply) and the problem is after that.
>Is it fair to assume that because arr.toByteArray() is making another
>copy,
>the memory duplicated, and that is why I'm getting an OOM.
>
>However, if the query returns 200,000 rows, the new OOM error is the
>following:
>
>java.lang.OutOfMemoryError: Java heap space
>at java.util.Arrays.copyOf(Arrays.java:2786)
>at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
>at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:263)
> at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:106)
>at java.io.OutputStreamWriter.write(OutputStreamWriter.java:190)
> at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:111)
>at java.io.BufferedWriter.write(BufferedWriter.java:125)
> at org.openjena.atlas.io.IndentedWriter.write(IndentedWriter.java:140)
>at
>org.openjena.atlas.io.IndentedWriter.printOneChar(IndentedWriter.java:135)
> at org.openjena.atlas.io.IndentedWriter.print(IndentedWriter.java:99)
>at
>com.hp.hpl.jena.sparql.resultset.XMLOutputResultSet.printLiteral(XMLOutput
>ResultSet.java:232)
> at
>com.hp.hpl.jena.sparql.resultset.XMLOutputResultSet.printBindingValue(XMLO
>utputResultSet.java:189)
>at
>com.hp.hpl.jena.sparql.resultset.XMLOutputResultSet.binding(XMLOutputResul
>tSet.java:169)
> at
>com.hp.hpl.jena.sparql.resultset.ResultSetApply.apply(ResultSetApply.java:
>49)
>at com.hp.hpl.jena.sparql.resultset.XMLOutput.format(XMLOutput.java:52)
> at 
>com.hp.hpl.jena.sparql.resultset.OutputBase.asString(OutputBase.java:34)
>at
>com.hp.hpl.jena.query.ResultSetFormatter.asXMLString(ResultSetFormatter.ja
>va:548)
> at
>com.hp.hpl.jena.query.ResultSetFormatter.asXMLString(ResultSetFormatter.ja
>va:535)
>
>In this case, we ran out of memory while iterating through the ResultSet.
>
>An obvious thing to do is to increment the java heap size. But the issue
>is
>that the queries I'm running will return over a million rows. For such
>queries, I've increased the heap size to 4gb, and I'm still getting an OOM
>error.
>
>Is there something I'm doing wrong? Or is the solution to increase even
>more the heap space?
>
>Thanks for your pointers!
>
>
>Juan Sequeda
>+1-575-SEQ-UEDA
>www.juansequeda.com

Re: OOM with ResultSetFormatter.asXMLString

Reply via email to