Re: GC limit using ResultSet.outputAsXML - way to optimize my code?

Brice Sommacal Fri, 07 Jun 2013 00:54:01 -0700

Hello,

The preceding error (XSLTransformation) was occuring in my Eclipse
environment (set with Xmx and Xms at 1024M).
When I move my code in a web server environment (set with Xmx and Xms at
6000M), the XSL transformation goes well, but I keep tracking a Java Heap
Space error:


java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.<init>(ArrayList.java:112)
at java.util.ArrayList.<init>(ArrayList.java:119)
 at org.apache.jena.atlas.lib.DS.list(DS.java:54)
at
org.apache.jena.atlas.iterator.IteratorConcat.<init>(IteratorConcat.java:34)
 at
org.apache.jena.atlas.iterator.IteratorConcat.concat(IteratorConcat.java:45)
at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.vars(BindingBase.java:77)
 at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.vars(BindingBase.java:77)
at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.vars(BindingBase.java:77)
 at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.vars(BindingBase.java:77)
at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.vars(BindingBase.java:77)
 at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.vars(BindingBase.java:77)
at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.vars(BindingBase.java:77)
 at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.vars(BindingBase.java:77)
at
com.hp.hpl.jena.sparql.engine.binding.BindingProjectBase.actualVars(BindingProjectBase.java:79)
 at
com.hp.hpl.jena.sparql.engine.binding.BindingProjectBase.vars1(BindingProjectBase.java:71)
at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.vars(BindingBase.java:75)
 at
com.hp.hpl.jena.sparql.engine.binding.BindingProjectBase.actualVars(BindingProjectBase.java:79)
at
com.hp.hpl.jena.sparql.engine.binding.BindingProjectBase.vars1(BindingProjectBase.java:71)
 at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.vars(BindingBase.java:75)
at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.hashCode(BindingBase.java:199)
 at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.hashCode(BindingBase.java:185)
at java.util.HashMap.put(HashMap.java:372)
 at java.util.HashSet.add(HashSet.java:200)
at org.apache.jena.atlas.data.SortedDataBag.add(SortedDataBag.java:114)
 at
org.apache.jena.atlas.data.DistinctDataNet.netAdd(DistinctDataNet.java:58)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterDistinct.isFreshSighting(QueryIterDistinct.java:66)
 at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterDistinctReduced.hasNextBinding(QueryIterDistinctReduced.java:61)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
 at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
 at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)

Definetely, the XML serialization is not good enough for my use case.

What should be the best solution?
<quote>
 - Read data from a RDF Store (Jena TBD, Sesame) and return data with a
SPARQL end point (and apply the XSL on the fly [streaming])
 - Convert data from OWL files into an Exhibit table (staged mode). So
let's directly parameter the Exhibit storage mode.
                (by the way, I didn't succeed to set up Exhibit 3 staged in
a windows environment yet)
- Read data from a RDF Store and create a specific connector with Exhibit
API?
</quote>

Regards,


Brice


2013/6/6 Brice Sommacal <[email protected]>

> Hi Andy,
>
> I was using Jena 2.6.4 and I have just upgraded to 2.10.1..
> The logs are:
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>  at java.util.Arrays.copyOf(Unknown Source)
> at java.util.Arrays.copyOf(Unknown Source)
> at java.util.Vector.ensureCapacityHelper(Unknown Source)
>  at java.util.Vector.addElement(Unknown Source)
> at
> com.sun.org.apache.xml.internal.dtm.ref.sax2dtm.SAX2DTM2.startElement(Unknown
> Source)
>  at
> com.sun.org.apache.xalan.internal.xsltc.dom.SAXImpl.startElement(Unknown
> Source)
> at
> com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.startElement(Unknown
> Source)
>  at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown
> Source)
> at
> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown
> Source)
>  at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
> Source)
> at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
>  at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>  at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>  at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
> Source)
> at com.sun.org.apache.xalan.internal.xsltc.trax.TrAXFilter.parse(Unknown
> Source)
>  at
> com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(Unknown
> Source)
> at
> com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(Unknown
> Source)
>  at
> com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(Unknown
> Source)
> at
> com.pcoinnovation.genericbrowser.json.FiltreXSL.transformer(FiltreXSL.java:47)
>
> So, from now, it's not because of the ResultSetFormatter but from the XSL
> Transformation with SAX.
> Thanks Andy for pointing this out.
>
> There is no parralel requests because I'm executing them one by one, and
> close the query every time.
>
>
>
> 2013/6/6 Andy Seaborne <[email protected]>
>
>> On 06/06/13 13:52, Brice Sommacal wrote:
>>
>>> The XML processing is inside the class ResultSetFormatter available from
>>> Jena API. I'm not sure if it's parse with XML DOM or SAX.
>>>
>>> Logs are here :
>>>          at
>>> org.openjena.atlas.io.**IndentedWriter.write(**IndentedWriter.java:128)
>>> at
>>> org.openjena.atlas.io.**IndentedWriter.printOneChar(**
>>> IndentedWriter.java:123)
>>>   at org.openjena.atlas.io.**IndentedWriter.print(**
>>> IndentedWriter.java:87)
>>>   at
>>> com.hp.hpl.jena.sparql.**resultset.XMLOutputResultSet.**printLiteral(**
>>> XMLOutputResultSet.java:182)
>>> at
>>> com.hp.hpl.jena.sparql.**resultset.XMLOutputResultSet.**
>>> printBindingValue(**XMLOutputResultSet.java:148)
>>>   at
>>> com.hp.hpl.jena.sparql.**resultset.XMLOutputResultSet.**
>>> binding(XMLOutputResultSet.**java:132)
>>>
>>> Jena API provide way to add the stylesheet inside the XML (xsl:reference)
>>> but not to directly run the XML with the XSL.
>>> That's wy I firstly write the XML file (a result set serialization) , and
>>> then run a SAX processor with a stylesheet. The output is a JSON file.
>>>
>>
>> (version? it's not the current one)
>>
>> The ResultSet writing is streaming and not RAM limited.  It does not use
>> SAX or DOM, it just writes direct output.  The query may be consuming
>> space, some queries do, especially if inferencing is involved (ontoIn
>> suggests it might be) and this just happens to be where the heap limit is
>> hit.
>>
>> Processing the XML output may well be memory consuming but that's not
>> Jena.
>>
>> Are there parallel requests going on?  They all compete for RAM.
>>
>>         Andy
>>
>>
>>
>>>
>>> Brice
>>>
>>>
>>> 2013/6/6 Claude Warren <[email protected]>
>>>
>>>  I have not followed this discussion very closely so please excuse any
>>>> items
>>>> that have already been discussed.
>>>>
>>>> You state you are serializing the result set to XML apply a style sheet
>>>> and
>>>> output as json.
>>>>
>>>> Does your XML processing use the XML Dom or SAX processor?  (DOM
>>>> results in
>>>> a memory footprint of approx 3x document size)  You can run the Style
>>>> sheet
>>>> processing directly agains the SAX processor and have a minimal
>>>> footprint.
>>>>
>>>> Does your stylesheet output the JSON or do you use an XML to JSON
>>>> converter?  If the latter, is does it use or can it use streaming like
>>>> the
>>>> SAX parser does?
>>>>
>>>> Claude
>>>>
>>>>
>>>> On Thu, Jun 6, 2013 at 1:28 PM, Brice Sommacal <
>>>> [email protected]
>>>>
>>>>> wrote:
>>>>>
>>>>
>>>>  Hi Olivier,
>>>>>
>>>>> Thanks for the tips for using your library.
>>>>> It may be useful one day.
>>>>> Can I have a look at it? I'm wondering how the n3 graph is read (from a
>>>>> file?)
>>>>> Is it possible to manage an other data source from? like a RDF Store?
>>>>>
>>>>> For my case, my code is inside a java servlet and I don't manage to set
>>>>>
>>>> up
>>>>
>>>>> the application with data from a IHM. So there is no way to use a
>>>>> javascript library (not yet ;-))
>>>>>
>>>>> Thanks anyway,
>>>>>
>>>>>
>>>>> Brice
>>>>>
>>>>>
>>>>> 2013/6/5 Olivier Rossel <[email protected]>
>>>>>
>>>>>  i have a small javascript that converts a n3 graph into a javascript
>>>>>>
>>>>> graph
>>>>>
>>>>>> of objects.
>>>>>> if your problem is related to XML stuff and such a lib could help, let
>>>>>>
>>>>> me
>>>>
>>>>> know.
>>>>>> (it might be interesting to contribute it directly to exhibit, btw)
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 5, 2013 at 6:13 PM, Brice Sommacal <
>>>>>>
>>>>> [email protected]
>>>>
>>>>>  wrote:
>>>>>>>
>>>>>>
>>>>>>  Hello everyone,
>>>>>>>
>>>>>>> I'm facing a "java.lang.OutOfMemoryError: GC overhead limit exceeded"
>>>>>>>
>>>>>> error
>>>>>>
>>>>>>> and I would like an advice about how I could optimize my code.
>>>>>>>
>>>>>>> The aim of this method is to run a SPARQL query, convert it on a XML
>>>>>>>
>>>>>> format
>>>>>>
>>>>>>> and then apply a XSL stylesheet[1] to write a JSON format (readable
>>>>>>>
>>>>>> by
>>>>
>>>>>  Exhibit - Scripted [2]).
>>>>>>>
>>>>>>>   My piece of code was working well untill today. (I have been trying
>>>>>>>
>>>>>> to
>>>>
>>>>>  query a big model and the query returns too much results).
>>>>>>> This makes my program break.
>>>>>>>
>>>>>>> <quote>
>>>>>>> Query queryToExec = QueryFactory.create(query, Syntax.syntaxARQ);
>>>>>>> QueryExecution qexec = QueryExecutionFactory.create(**queryToExec,
>>>>>>>
>>>>>> ontoIn);
>>>>>
>>>>>>   ResultSet result = null;
>>>>>>> BufferedOutputStream buf;
>>>>>>> try{
>>>>>>>   result = qexec.execSelect();
>>>>>>> buf = new BufferedOutputStream(new FileOutputStream(new File(root +
>>>>>>> "XML/JSON_XML/"+qNameClass+".**xml")));
>>>>>>>   //Serialization of the resultSet
>>>>>>> ResultSetFormatter.**outputAsXML(buf, result);
>>>>>>> buf.close();
>>>>>>>   }
>>>>>>> catch (Exception e) {
>>>>>>> e.printStackTrace();
>>>>>>>   }
>>>>>>> finally{
>>>>>>> qexec.close();
>>>>>>> }
>>>>>>> </quote>
>>>>>>>
>>>>>>> I know that writing XML file use loads memory....
>>>>>>>
>>>>>>> I was thinking of:
>>>>>>>   - creating several XML files by tracing the ResullSetFormatter
>>>>>>>
>>>>>> memory
>>>>
>>>>>  usage. (is there possible?)
>>>>>>>   - avoiding XML intermediate format and write directly in one or
>>>>>>>
>>>>>> several
>>>>>
>>>>>> JSON file...
>>>>>>>   - ...
>>>>>>>
>>>>>>>
>>>>>>>    Is there someone whom find a way to avoid this kind of error
>>>>>>>>
>>>>>>> (without
>>>>>
>>>>>> increasing Xms Xmx) ??
>>>>>>>
>>>>>>> Thanks in advance,
>>>>>>>
>>>>>>>
>>>>>>> Brice
>>>>>>>
>>>>>>> [1] 
>>>>>>> http://data-gov.tw.rpi.edu/**wiki/Sparqlxml2exhibitjson.xsl<http://data-gov.tw.rpi.edu/wiki/Sparqlxml2exhibitjson.xsl>
>>>>>>> [2] 
>>>>>>> http://www.simile-widgets.org/**exhibit3/<http://www.simile-widgets.org/exhibit3/>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> I like: Like Like - The likeliest place on the web<
>>>> http://like-like.xenei.com>
>>>> Identity: 
>>>> https://www.identify.nu/user.**[email protected]<https://www.identify.nu/[email protected]>
>>>> LinkedIn: 
>>>> http://www.linkedin.com/in/**claudewarren<http://www.linkedin.com/in/claudewarren>
>>>>
>>>>
>>>
>>
>

Re: GC limit using ResultSet.outputAsXML - way to optimize my code?

Reply via email to