Re: GC limit using ResultSet.outputAsXML - way to optimize my code?

Andy Seaborne Fri, 07 Jun 2013 08:39:59 -0700

Brice,

What's the query?


        Andy

On 07/06/13 08:52, Brice Sommacal wrote:

Hello,

The preceding error (XSLTransformation) was occuring in my Eclipse
environment (set with Xmx and Xms at 1024M).
When I move my code in a web server environment (set with Xmx and Xms at
6000M), the XSL transformation goes well, but I keep tracking a Java Heap
Space error:

java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.<init>(ArrayList.java:112)
at java.util.ArrayList.<init>(ArrayList.java:119)
  at org.apache.jena.atlas.lib.DS.list(DS.java:54)
at
org.apache.jena.atlas.iterator.IteratorConcat.<init>(IteratorConcat.java:34)
  at
org.apache.jena.atlas.iterator.IteratorConcat.concat(IteratorConcat.java:45)
at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.vars(BindingBase.java:77)
  at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.vars(BindingBase.java:77)
at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.vars(BindingBase.java:77)
  at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.vars(BindingBase.java:77)
at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.vars(BindingBase.java:77)
  at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.vars(BindingBase.java:77)
at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.vars(BindingBase.java:77)
  at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.vars(BindingBase.java:77)
at
com.hp.hpl.jena.sparql.engine.binding.BindingProjectBase.actualVars(BindingProjectBase.java:79)
  at
com.hp.hpl.jena.sparql.engine.binding.BindingProjectBase.vars1(BindingProjectBase.java:71)
at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.vars(BindingBase.java:75)
  at
com.hp.hpl.jena.sparql.engine.binding.BindingProjectBase.actualVars(BindingProjectBase.java:79)
at
com.hp.hpl.jena.sparql.engine.binding.BindingProjectBase.vars1(BindingProjectBase.java:71)
  at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.vars(BindingBase.java:75)
at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.hashCode(BindingBase.java:199)
  at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.hashCode(BindingBase.java:185)
at java.util.HashMap.put(HashMap.java:372)
  at java.util.HashSet.add(HashSet.java:200)
at org.apache.jena.atlas.data.SortedDataBag.add(SortedDataBag.java:114)
  at
org.apache.jena.atlas.data.DistinctDataNet.netAdd(DistinctDataNet.java:58)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterDistinct.isFreshSighting(QueryIterDistinct.java:66)
  at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterDistinctReduced.hasNextBinding(QueryIterDistinctReduced.java:61)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
  at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
  at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)

Definetely, the XML serialization is not good enough for my use case.

What should be the best solution?
<quote>
  - Read data from a RDF Store (Jena TBD, Sesame) and return data with a
SPARQL end point (and apply the XSL on the fly [streaming])
  - Convert data from OWL files into an Exhibit table (staged mode). So
let's directly parameter the Exhibit storage mode.
                 (by the way, I didn't succeed to set up Exhibit 3 staged in
a windows environment yet)
- Read data from a RDF Store and create a specific connector with Exhibit
API?
</quote>

Regards,


Brice


2013/6/6 Brice Sommacal <[email protected]>

Hi Andy,

I was using Jena 2.6.4 and I have just upgraded to 2.10.1..
The logs are:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
  at java.util.Arrays.copyOf(Unknown Source)
at java.util.Arrays.copyOf(Unknown Source)
at java.util.Vector.ensureCapacityHelper(Unknown Source)
  at java.util.Vector.addElement(Unknown Source)
at
com.sun.org.apache.xml.internal.dtm.ref.sax2dtm.SAX2DTM2.startElement(Unknown
Source)
  at
com.sun.org.apache.xalan.internal.xsltc.dom.SAXImpl.startElement(Unknown
Source)
at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.startElement(Unknown
Source)
  at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown
Source)
at
org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown
Source)
  at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
  at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
  at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
  at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
Source)
at com.sun.org.apache.xalan.internal.xsltc.trax.TrAXFilter.parse(Unknown
Source)
  at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(Unknown
Source)
at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(Unknown
Source)
  at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(Unknown
Source)
at
com.pcoinnovation.genericbrowser.json.FiltreXSL.transformer(FiltreXSL.java:47)

So, from now, it's not because of the ResultSetFormatter but from the XSL
Transformation with SAX.
Thanks Andy for pointing this out.

There is no parralel requests because I'm executing them one by one, and
close the query every time.



2013/6/6 Andy Seaborne <[email protected]>

On 06/06/13 13:52, Brice Sommacal wrote:

The XML processing is inside the class ResultSetFormatter available from
Jena API. I'm not sure if it's parse with XML DOM or SAX.

Logs are here :
          at
org.openjena.atlas.io.**IndentedWriter.write(**IndentedWriter.java:128)
at
org.openjena.atlas.io.**IndentedWriter.printOneChar(**
IndentedWriter.java:123)
   at org.openjena.atlas.io.**IndentedWriter.print(**
IndentedWriter.java:87)
   at
com.hp.hpl.jena.sparql.**resultset.XMLOutputResultSet.**printLiteral(**
XMLOutputResultSet.java:182)
at
com.hp.hpl.jena.sparql.**resultset.XMLOutputResultSet.**
printBindingValue(**XMLOutputResultSet.java:148)
   at
com.hp.hpl.jena.sparql.**resultset.XMLOutputResultSet.**
binding(XMLOutputResultSet.**java:132)

Jena API provide way to add the stylesheet inside the XML (xsl:reference)
but not to directly run the XML with the XSL.
That's wy I firstly write the XML file (a result set serialization) , and
then run a SAX processor with a stylesheet. The output is a JSON file.


(version? it's not the current one)

The ResultSet writing is streaming and not RAM limited.  It does not use
SAX or DOM, it just writes direct output.  The query may be consuming
space, some queries do, especially if inferencing is involved (ontoIn
suggests it might be) and this just happens to be where the heap limit is
hit.

Processing the XML output may well be memory consuming but that's not
Jena.

Are there parallel requests going on?  They all compete for RAM.

         Andy


Brice


2013/6/6 Claude Warren <[email protected]>

  I have not followed this discussion very closely so please excuse any

items
that have already been discussed.

You state you are serializing the result set to XML apply a style sheet
and
output as json.

Does your XML processing use the XML Dom or SAX processor?  (DOM
results in
a memory footprint of approx 3x document size)  You can run the Style
sheet
processing directly agains the SAX processor and have a minimal
footprint.

Does your stylesheet output the JSON or do you use an XML to JSON
converter?  If the latter, is does it use or can it use streaming like
the
SAX parser does?

Claude


On Thu, Jun 6, 2013 at 1:28 PM, Brice Sommacal <
[email protected]

wrote:


  Hi Olivier,


Thanks for the tips for using your library.
It may be useful one day.
Can I have a look at it? I'm wondering how the n3 graph is read (from a
file?)
Is it possible to manage an other data source from? like a RDF Store?

For my case, my code is inside a java servlet and I don't manage to set

up

the application with data from a IHM. So there is no way to use a
javascript library (not yet ;-))

Thanks anyway,


Brice


2013/6/5 Olivier Rossel <[email protected]>

  i have a small javascript that converts a n3 graph into a javascript

graph

of objects.
if your problem is related to XML stuff and such a lib could help, let

me

know.

(it might be interesting to contribute it directly to exhibit, btw)


On Wed, Jun 5, 2013 at 6:13 PM, Brice Sommacal <

[email protected]

  wrote:


  Hello everyone,


I'm facing a "java.lang.OutOfMemoryError: GC overhead limit exceeded"

error

and I would like an advice about how I could optimize my code.

The aim of this method is to run a SPARQL query, convert it on a XML

format

and then apply a XSL stylesheet[1] to write a JSON format (readable

by

  Exhibit - Scripted [2]).


   My piece of code was working well untill today. (I have been trying

to

  query a big model and the query returns too much results).

This makes my program break.

<quote>
Query queryToExec = QueryFactory.create(query, Syntax.syntaxARQ);
QueryExecution qexec = QueryExecutionFactory.create(**queryToExec,

ontoIn);

   ResultSet result = null;

BufferedOutputStream buf;
try{
   result = qexec.execSelect();
buf = new BufferedOutputStream(new FileOutputStream(new File(root +
"XML/JSON_XML/"+qNameClass+".**xml")));
   //Serialization of the resultSet
ResultSetFormatter.**outputAsXML(buf, result);
buf.close();
   }
catch (Exception e) {
e.printStackTrace();
   }
finally{
qexec.close();
}
</quote>

I know that writing XML file use loads memory....

I was thinking of:
   - creating several XML files by tracing the ResullSetFormatter

memory

  usage. (is there possible?)

   - avoiding XML intermediate format and write directly in one or

several

JSON file...

   - ...


    Is there someone whom find a way to avoid this kind of error

(without

increasing Xms Xmx) ??


Thanks in advance,


Brice

[1] 
http://data-gov.tw.rpi.edu/**wiki/Sparqlxml2exhibitjson.xsl<http://data-gov.tw.rpi.edu/wiki/Sparqlxml2exhibitjson.xsl>
[2] 
http://www.simile-widgets.org/**exhibit3/<http://www.simile-widgets.org/exhibit3/>



--
I like: Like Like - The likeliest place on the web<
http://like-like.xenei.com>
Identity: 
https://www.identify.nu/user.**[email protected]<https://www.identify.nu/[email protected]>
LinkedIn: 
http://www.linkedin.com/in/**claudewarren<http://www.linkedin.com/in/claudewarren>

Re: GC limit using ResultSet.outputAsXML - way to optimize my code?

Reply via email to