Re: GC limit using ResultSet.outputAsXML - way to optimize my code?

Brice Sommacal Thu, 06 Jun 2013 06:49:52 -0700

Hi Andy,

I was using Jena 2.6.4 and I have just upgraded to 2.10.1..
The logs are:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Unknown Source)
at java.util.Arrays.copyOf(Unknown Source)
at java.util.Vector.ensureCapacityHelper(Unknown Source)
 at java.util.Vector.addElement(Unknown Source)
at
com.sun.org.apache.xml.internal.dtm.ref.sax2dtm.SAX2DTM2.startElement(Unknown
Source)
 at
com.sun.org.apache.xalan.internal.xsltc.dom.SAXImpl.startElement(Unknown
Source)
at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.startElement(Unknown
Source)
 at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown
Source)
 at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
 at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
 at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
 at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at com.sun.org.apache.xalan.internal.xsltc.trax.TrAXFilter.parse(Unknown
Source)
 at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(Unknown
Source)
at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(Unknown
Source)
 at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(Unknown
Source)
at
com.pcoinnovation.genericbrowser.json.FiltreXSL.transformer(FiltreXSL.java:47)


So, from now, it's not because of the ResultSetFormatter but from the XSL
Transformation with SAX.
Thanks Andy for pointing this out.

There is no parralel requests because I'm executing them one by one, and
close the query every time.



2013/6/6 Andy Seaborne <[email protected]>

> On 06/06/13 13:52, Brice Sommacal wrote:
>
>> The XML processing is inside the class ResultSetFormatter available from
>> Jena API. I'm not sure if it's parse with XML DOM or SAX.
>>
>> Logs are here :
>>          at
>> org.openjena.atlas.io.**IndentedWriter.write(**IndentedWriter.java:128)
>> at
>> org.openjena.atlas.io.**IndentedWriter.printOneChar(**
>> IndentedWriter.java:123)
>>   at org.openjena.atlas.io.**IndentedWriter.print(**
>> IndentedWriter.java:87)
>>   at
>> com.hp.hpl.jena.sparql.**resultset.XMLOutputResultSet.**printLiteral(**
>> XMLOutputResultSet.java:182)
>> at
>> com.hp.hpl.jena.sparql.**resultset.XMLOutputResultSet.**
>> printBindingValue(**XMLOutputResultSet.java:148)
>>   at
>> com.hp.hpl.jena.sparql.**resultset.XMLOutputResultSet.**
>> binding(XMLOutputResultSet.**java:132)
>>
>> Jena API provide way to add the stylesheet inside the XML (xsl:reference)
>> but not to directly run the XML with the XSL.
>> That's wy I firstly write the XML file (a result set serialization) , and
>> then run a SAX processor with a stylesheet. The output is a JSON file.
>>
>
> (version? it's not the current one)
>
> The ResultSet writing is streaming and not RAM limited.  It does not use
> SAX or DOM, it just writes direct output.  The query may be consuming
> space, some queries do, especially if inferencing is involved (ontoIn
> suggests it might be) and this just happens to be where the heap limit is
> hit.
>
> Processing the XML output may well be memory consuming but that's not Jena.
>
> Are there parallel requests going on?  They all compete for RAM.
>
>         Andy
>
>
>
>>
>> Brice
>>
>>
>> 2013/6/6 Claude Warren <[email protected]>
>>
>>  I have not followed this discussion very closely so please excuse any
>>> items
>>> that have already been discussed.
>>>
>>> You state you are serializing the result set to XML apply a style sheet
>>> and
>>> output as json.
>>>
>>> Does your XML processing use the XML Dom or SAX processor?  (DOM results
>>> in
>>> a memory footprint of approx 3x document size)  You can run the Style
>>> sheet
>>> processing directly agains the SAX processor and have a minimal
>>> footprint.
>>>
>>> Does your stylesheet output the JSON or do you use an XML to JSON
>>> converter?  If the latter, is does it use or can it use streaming like
>>> the
>>> SAX parser does?
>>>
>>> Claude
>>>
>>>
>>> On Thu, Jun 6, 2013 at 1:28 PM, Brice Sommacal <[email protected]
>>>
>>>> wrote:
>>>>
>>>
>>>  Hi Olivier,
>>>>
>>>> Thanks for the tips for using your library.
>>>> It may be useful one day.
>>>> Can I have a look at it? I'm wondering how the n3 graph is read (from a
>>>> file?)
>>>> Is it possible to manage an other data source from? like a RDF Store?
>>>>
>>>> For my case, my code is inside a java servlet and I don't manage to set
>>>>
>>> up
>>>
>>>> the application with data from a IHM. So there is no way to use a
>>>> javascript library (not yet ;-))
>>>>
>>>> Thanks anyway,
>>>>
>>>>
>>>> Brice
>>>>
>>>>
>>>> 2013/6/5 Olivier Rossel <[email protected]>
>>>>
>>>>  i have a small javascript that converts a n3 graph into a javascript
>>>>>
>>>> graph
>>>>
>>>>> of objects.
>>>>> if your problem is related to XML stuff and such a lib could help, let
>>>>>
>>>> me
>>>
>>>> know.
>>>>> (it might be interesting to contribute it directly to exhibit, btw)
>>>>>
>>>>>
>>>>> On Wed, Jun 5, 2013 at 6:13 PM, Brice Sommacal <
>>>>>
>>>> [email protected]
>>>
>>>> wrote:
>>>>>>
>>>>>
>>>>>  Hello everyone,
>>>>>>
>>>>>> I'm facing a "java.lang.OutOfMemoryError: GC overhead limit exceeded"
>>>>>>
>>>>> error
>>>>>
>>>>>> and I would like an advice about how I could optimize my code.
>>>>>>
>>>>>> The aim of this method is to run a SPARQL query, convert it on a XML
>>>>>>
>>>>> format
>>>>>
>>>>>> and then apply a XSL stylesheet[1] to write a JSON format (readable
>>>>>>
>>>>> by
>>>
>>>> Exhibit - Scripted [2]).
>>>>>>
>>>>>>   My piece of code was working well untill today. (I have been trying
>>>>>>
>>>>> to
>>>
>>>> query a big model and the query returns too much results).
>>>>>> This makes my program break.
>>>>>>
>>>>>> <quote>
>>>>>> Query queryToExec = QueryFactory.create(query, Syntax.syntaxARQ);
>>>>>> QueryExecution qexec = QueryExecutionFactory.create(**queryToExec,
>>>>>>
>>>>> ontoIn);
>>>>
>>>>>   ResultSet result = null;
>>>>>> BufferedOutputStream buf;
>>>>>> try{
>>>>>>   result = qexec.execSelect();
>>>>>> buf = new BufferedOutputStream(new FileOutputStream(new File(root +
>>>>>> "XML/JSON_XML/"+qNameClass+".**xml")));
>>>>>>   //Serialization of the resultSet
>>>>>> ResultSetFormatter.**outputAsXML(buf, result);
>>>>>> buf.close();
>>>>>>   }
>>>>>> catch (Exception e) {
>>>>>> e.printStackTrace();
>>>>>>   }
>>>>>> finally{
>>>>>> qexec.close();
>>>>>> }
>>>>>> </quote>
>>>>>>
>>>>>> I know that writing XML file use loads memory....
>>>>>>
>>>>>> I was thinking of:
>>>>>>   - creating several XML files by tracing the ResullSetFormatter
>>>>>>
>>>>> memory
>>>
>>>> usage. (is there possible?)
>>>>>>   - avoiding XML intermediate format and write directly in one or
>>>>>>
>>>>> several
>>>>
>>>>> JSON file...
>>>>>>   - ...
>>>>>>
>>>>>>
>>>>>>    Is there someone whom find a way to avoid this kind of error
>>>>>>>
>>>>>> (without
>>>>
>>>>> increasing Xms Xmx) ??
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>>
>>>>>> Brice
>>>>>>
>>>>>> [1] 
>>>>>> http://data-gov.tw.rpi.edu/**wiki/Sparqlxml2exhibitjson.xsl<http://data-gov.tw.rpi.edu/wiki/Sparqlxml2exhibitjson.xsl>
>>>>>> [2] 
>>>>>> http://www.simile-widgets.org/**exhibit3/<http://www.simile-widgets.org/exhibit3/>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> I like: Like Like - The likeliest place on the web<
>>> http://like-like.xenei.com>
>>> Identity: 
>>> https://www.identify.nu/user.**[email protected]<https://www.identify.nu/[email protected]>
>>> LinkedIn: 
>>> http://www.linkedin.com/in/**claudewarren<http://www.linkedin.com/in/claudewarren>
>>>
>>>
>>
>

Re: GC limit using ResultSet.outputAsXML - way to optimize my code?

Reply via email to