The optimizer likes the query and wil execute in a streaming fashion
(which means the memory footprint is fixed and unrelated to the data
size, only the query size) except for the DISTINCT.
The query may have high fan-out effects leading to many partial duplicates
e.g. if you have
OPTIONAL{ ?s :p ?v1 }
OPTIONAL{ ?s :q ?v2 }
and there are 5 :p per ?s and 2 :q per ?s, then that pair of OPTIONALs
generates 10 different rows (the cross product of ?v1 and ?v2 matches)
You say you read the model into memory. Together with the fact that the
out-of-memory condition is happening in different places
It will depend on what 'ontoIn' is - what sort of model is it? An
OntModel? And what's the base data stored in? TDB? Memory?
Could you try replacing the SELECT clause with
SELECT (count(*) AS ?c)
and say what the value of ?c is.
Andy
On 11/06/13 10:05, Brice Sommacal wrote:
Hello Andy,
The query is generated from a XML file.
Once the model is read and available in memory, we create a XML file
for each OWL class with all their properties. (see attached for an example).
Then, from this XML file, we generate a SPARQL SELECT query like below
and save the results in a XML format.
Finally, we apply a XSL transformation to convert the XML file in a JSON
format.
Hope it's enough clear.
For the time being, I'm going to start an analysis about directly
populate the Exhibit 3 Staged storage mode without using the XML file.
(to convert in JSON and use the Exhibit 3 scripted storage mode in JSON)
Regards,
Brice
He is the query:
PREFIX : <http://seamless.pco-innovation.com/energy/common/software/tcua#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX afn: <http://jena.hpl.hp.com/ARQ/function#>
PREFIX fn: <http://www.w3.org/2005/xpath-functions#>
SELECT DISTINCT ?label ?type ?id ?uri ?tctype__tctypecstattach
?tctype__tcproperty ?tcstdtype__tcclass ?parentTypeName
?inverse_of_primaryTypeName ?typeName ?isAbstract
?tctype__constantattach ?childTypeName ?tctype__tcdisplayrule
?description ?inverse_of_secondaryTypeName ?tctype__tcgrmrule
?sourceTypeName__tccomprule ?tctyper__tcdeepcprule
?destTypeName__tccomprule ?tctype__tcdeepcprule ?tctypeo__tcdeepcprule
?noteTypeName
WHERE{
?instance rdf:type
<http://seamless.pco-innovation.com/energy/common/software/tcu83#TCSTANDARDTYPE>.
?instance rdf:type ?typeTemp.
LET(?type := afn:localname(?typeTemp)).
?instance :inferredLabel ?label.
LET(?id := afn:localname(?instance)).
LET(?uri := fn:concat('../mf/MF.html?graphName=TCSTANDARDTYPE&QName=' ,
afn:localname(?instance))).
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#tctype__tctypecstattach>
?tctype__tctypecstattachNode.
?tctype__tctypecstattachNode :inferredLabel
?labeltctype__tctypecstattachNode.
LET(?tctype__tctypecstattach := str(?labeltctype__tctypecstattachNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#tctype__tcproperty>
?tctype__tcpropertyNode.
?tctype__tcpropertyNode :inferredLabel ?labeltctype__tcpropertyNode.
LET(?tctype__tcproperty := str(?labeltctype__tcpropertyNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#tcstdtype__tcclass>
?tcstdtype__tcclassNode.
?tcstdtype__tcclassNode :inferredLabel ?labeltcstdtype__tcclassNode.
LET(?tcstdtype__tcclass := str(?labeltcstdtype__tcclassNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#parentTypeName>
?parentTypeNameNode.
?parentTypeNameNode :inferredLabel ?labelparentTypeNameNode.
LET(?parentTypeName := str(?labelparentTypeNameNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#inverse_of_primaryTypeName>
?inverse_of_primaryTypeNameNode.
?inverse_of_primaryTypeNameNode :inferredLabel
?labelinverse_of_primaryTypeNameNode.
LET(?inverse_of_primaryTypeName :=
str(?labelinverse_of_primaryTypeNameNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#typeName>
?typeNameTemp.
LET(?typeName := str(?typeNameTemp)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#isAbstract>
?isAbstractTemp.
LET(?isAbstract := str(?isAbstractTemp)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#tctype__constantattach>
?tctype__constantattachNode.
?tctype__constantattachNode :inferredLabel ?labeltctype__constantattachNode.
LET(?tctype__constantattach := str(?labeltctype__constantattachNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#childTypeName>
?childTypeNameNode.
?childTypeNameNode :inferredLabel ?labelchildTypeNameNode.
LET(?childTypeName := str(?labelchildTypeNameNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#tctype__tcdisplayrule>
?tctype__tcdisplayruleNode.
?tctype__tcdisplayruleNode :inferredLabel ?labeltctype__tcdisplayruleNode.
LET(?tctype__tcdisplayrule := str(?labeltctype__tcdisplayruleNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#description>
?descriptionTemp.
LET(?description := str(?descriptionTemp)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#inverse_of_secondaryTypeName>
?inverse_of_secondaryTypeNameNode.
?inverse_of_secondaryTypeNameNode :inferredLabel
?labelinverse_of_secondaryTypeNameNode.
LET(?inverse_of_secondaryTypeName :=
str(?labelinverse_of_secondaryTypeNameNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#tctype__tcgrmrule>
?tctype__tcgrmruleNode.
?tctype__tcgrmruleNode :inferredLabel ?labeltctype__tcgrmruleNode.
LET(?tctype__tcgrmrule := str(?labeltctype__tcgrmruleNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#sourceTypeName__tccomprule>
?sourceTypeName__tccompruleNode.
?sourceTypeName__tccompruleNode :inferredLabel
?labelsourceTypeName__tccompruleNode.
LET(?sourceTypeName__tccomprule :=
str(?labelsourceTypeName__tccompruleNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#tctyper__tcdeepcprule>
?tctyper__tcdeepcpruleNode.
?tctyper__tcdeepcpruleNode :inferredLabel ?labeltctyper__tcdeepcpruleNode.
LET(?tctyper__tcdeepcprule := str(?labeltctyper__tcdeepcpruleNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#destTypeName__tccomprule>
?destTypeName__tccompruleNode.
?destTypeName__tccompruleNode :inferredLabel
?labeldestTypeName__tccompruleNode.
LET(?destTypeName__tccomprule := str(?labeldestTypeName__tccompruleNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#tctype__tcdeepcprule>
?tctype__tcdeepcpruleNode.
?tctype__tcdeepcpruleNode :inferredLabel ?labeltctype__tcdeepcpruleNode.
LET(?tctype__tcdeepcprule := str(?labeltctype__tcdeepcpruleNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#tctypeo__tcdeepcprule>
?tctypeo__tcdeepcpruleNode.
?tctypeo__tcdeepcpruleNode :inferredLabel ?labeltctypeo__tcdeepcpruleNode.
LET(?tctypeo__tcdeepcprule := str(?labeltctypeo__tcdeepcpruleNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#noteTypeName>
?noteTypeNameTemp.
LET(?noteTypeName := str(?noteTypeNameTemp)).
}
}
2013/6/7 Andy Seaborne <[email protected] <mailto:[email protected]>>
Brice,
What's the query?
Andy
On 07/06/13 08:52, Brice Sommacal wrote:
Hello,
The preceding error (XSLTransformation) was occuring in my Eclipse
environment (set with Xmx and Xms at 1024M).
When I move my code in a web server environment (set with Xmx
and Xms at
6000M), the XSL transformation goes well, but I keep tracking a
Java Heap
Space error:
java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.<init>(__ArrayList.java:112)
at java.util.ArrayList.<init>(__ArrayList.java:119)
at org.apache.jena.atlas.lib.DS.__list(DS.java:54)
at
org.apache.jena.atlas.__iterator.IteratorConcat.<init>__(IteratorConcat.java:34)
at
org.apache.jena.atlas.__iterator.IteratorConcat.__concat(IteratorConcat.java:45)
at
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.vars(__BindingBase.java:77)
at
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.vars(__BindingBase.java:77)
at
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.vars(__BindingBase.java:77)
at
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.vars(__BindingBase.java:77)
at
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.vars(__BindingBase.java:77)
at
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.vars(__BindingBase.java:77)
at
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.vars(__BindingBase.java:77)
at
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.vars(__BindingBase.java:77)
at
com.hp.hpl.jena.sparql.engine.__binding.BindingProjectBase.__actualVars(BindingProjectBase.__java:79)
at
com.hp.hpl.jena.sparql.engine.__binding.BindingProjectBase.__vars1(BindingProjectBase.java:__71)
at
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.vars(__BindingBase.java:75)
at
com.hp.hpl.jena.sparql.engine.__binding.BindingProjectBase.__actualVars(BindingProjectBase.__java:79)
at
com.hp.hpl.jena.sparql.engine.__binding.BindingProjectBase.__vars1(BindingProjectBase.java:__71)
at
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.vars(__BindingBase.java:75)
at
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.hashCode(__BindingBase.java:199)
at
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.hashCode(__BindingBase.java:185)
at java.util.HashMap.put(HashMap.__java:372)
at java.util.HashSet.add(HashSet.__java:200)
at
org.apache.jena.atlas.data.__SortedDataBag.add(__SortedDataBag.java:114)
at
org.apache.jena.atlas.data.__DistinctDataNet.netAdd(__DistinctDataNet.java:58)
at
com.hp.hpl.jena.sparql.engine.__iterator.QueryIterDistinct.__isFreshSighting(__QueryIterDistinct.java:66)
at
com.hp.hpl.jena.sparql.engine.__iterator.__QueryIterDistinctReduced.__hasNextBinding(__QueryIterDistinctReduced.java:__61)
at
com.hp.hpl.jena.sparql.engine.__iterator.QueryIteratorBase.__hasNext(QueryIteratorBase.__java:112)
at
com.hp.hpl.jena.sparql.engine.__iterator.QueryIteratorWrapper.__hasNextBinding(__QueryIteratorWrapper.java:40)
at
com.hp.hpl.jena.sparql.engine.__iterator.QueryIteratorBase.__hasNext(QueryIteratorBase.__java:112)
at
com.hp.hpl.jena.sparql.engine.__iterator.QueryIteratorWrapper.__hasNextBinding(__QueryIteratorWrapper.java:40)
at
com.hp.hpl.jena.sparql.engine.__iterator.QueryIteratorBase.__hasNext(QueryIteratorBase.__java:112)
Definetely, the XML serialization is not good enough for my use
case.
What should be the best solution?
<quote>
- Read data from a RDF Store (Jena TBD, Sesame) and return
data with a
SPARQL end point (and apply the XSL on the fly [streaming])
- Convert data from OWL files into an Exhibit table (staged
mode). So
let's directly parameter the Exhibit storage mode.
(by the way, I didn't succeed to set up
Exhibit 3 staged in
a windows environment yet)
- Read data from a RDF Store and create a specific connector
with Exhibit
API?
</quote>
Regards,
Brice
2013/6/6 Brice Sommacal <[email protected]
<mailto:[email protected]>>
Hi Andy,
I was using Jena 2.6.4 and I have just upgraded to 2.10.1..
The logs are:
Exception in thread "main" java.lang.OutOfMemoryError: Java
heap space
at java.util.Arrays.copyOf(__Unknown Source)
at java.util.Arrays.copyOf(__Unknown Source)
at java.util.Vector.__ensureCapacityHelper(Unknown Source)
at java.util.Vector.addElement(__Unknown Source)
at
com.sun.org.apache.xml.__internal.dtm.ref.sax2dtm.__SAX2DTM2.startElement(Unknown
Source)
at
com.sun.org.apache.xalan.__internal.xsltc.dom.SAXImpl.__startElement(Unknown
Source)
at
com.sun.org.apache.xalan.__internal.xsltc.trax.__TransformerHandlerImpl.__startElement(Unknown
Source)
at
org.apache.xerces.parsers.__AbstractSAXParser.__startElement(Unknown
Source)
at
org.apache.xerces.impl.__XMLNSDocumentScannerImpl.__scanStartElement(Unknown
Source)
at
org.apache.xerces.impl.__XMLDocumentFragmentScannerImpl__$FragmentContentDispatcher.__dispatch(Unknown
Source)
at
org.apache.xerces.impl.__XMLDocumentFragmentScannerImpl__.scanDocument(Unknown
Source)
at
org.apache.xerces.parsers.__XML11Configuration.parse(__Unknown
Source)
at
org.apache.xerces.parsers.__XML11Configuration.parse(__Unknown
Source)
at org.apache.xerces.parsers.__XMLParser.parse(Unknown
Source)
at
org.apache.xerces.parsers.__AbstractSAXParser.parse(__Unknown
Source)
at
org.apache.xerces.jaxp.__SAXParserImpl$JAXPSAXParser.__parse(Unknown
Source)
at
com.sun.org.apache.xalan.__internal.xsltc.trax.__TrAXFilter.parse(Unknown
Source)
at
com.sun.org.apache.xalan.__internal.xsltc.trax.__TransformerImpl.__transformIdentity(Unknown
Source)
at
com.sun.org.apache.xalan.__internal.xsltc.trax.__TransformerImpl.transform(__Unknown
Source)
at
com.sun.org.apache.xalan.__internal.xsltc.trax.__TransformerImpl.transform(__Unknown
Source)
at
com.pcoinnovation.__genericbrowser.json.FiltreXSL.__transformer(FiltreXSL.java:47)
So, from now, it's not because of the ResultSetFormatter but
from the XSL
Transformation with SAX.
Thanks Andy for pointing this out.
There is no parralel requests because I'm executing them one
by one, and
close the query every time.
2013/6/6 Andy Seaborne <[email protected]
<mailto:[email protected]>>
On 06/06/13 13:52, Brice Sommacal wrote:
The XML processing is inside the class
ResultSetFormatter available from
Jena API. I'm not sure if it's parse with XML DOM or
SAX.
Logs are here :
at
org.openjena.atlas.io.**__IndentedWriter.write(**__IndentedWriter.java:128)
at
org.openjena.atlas.io.**__IndentedWriter.printOneChar(**
IndentedWriter.java:123)
at org.openjena.atlas.io.**__IndentedWriter.print(**
IndentedWriter.java:87)
at
com.hp.hpl.jena.sparql.**__resultset.XMLOutputResultSet.*__*printLiteral(**
XMLOutputResultSet.java:182)
at
com.hp.hpl.jena.sparql.**__resultset.XMLOutputResultSet.*__*
printBindingValue(**__XMLOutputResultSet.java:148)
at
com.hp.hpl.jena.sparql.**__resultset.XMLOutputResultSet.*__*
binding(XMLOutputResultSet.**__java:132)
Jena API provide way to add the stylesheet inside
the XML (xsl:reference)
but not to directly run the XML with the XSL.
That's wy I firstly write the XML file (a result set
serialization) , and
then run a SAX processor with a stylesheet. The
output is a JSON file.
(version? it's not the current one)
The ResultSet writing is streaming and not RAM limited.
It does not use
SAX or DOM, it just writes direct output. The query may
be consuming
space, some queries do, especially if inferencing is
involved (ontoIn
suggests it might be) and this just happens to be where
the heap limit is
hit.
Processing the XML output may well be memory consuming
but that's not
Jena.
Are there parallel requests going on? They all compete
for RAM.
Andy
Brice
2013/6/6 Claude Warren <[email protected]
<mailto:[email protected]>>
I have not followed this discussion very closely
so please excuse any
items
that have already been discussed.
You state you are serializing the result set to
XML apply a style sheet
and
output as json.
Does your XML processing use the XML Dom or SAX
processor? (DOM
results in
a memory footprint of approx 3x document size)
You can run the Style
sheet
processing directly agains the SAX processor and
have a minimal
footprint.
Does your stylesheet output the JSON or do you
use an XML to JSON
converter? If the latter, is does it use or can
it use streaming like
the
SAX parser does?
Claude
On Thu, Jun 6, 2013 at 1:28 PM, Brice Sommacal <
[email protected]
<mailto:[email protected]>
wrote:
Hi Olivier,
Thanks for the tips for using your library.
It may be useful one day.
Can I have a look at it? I'm wondering how
the n3 graph is read (from a
file?)
Is it possible to manage an other data
source from? like a RDF Store?
For my case, my code is inside a java
servlet and I don't manage to set
up
the application with data from a IHM. So
there is no way to use a
javascript library (not yet ;-))
Thanks anyway,
Brice
2013/6/5 Olivier Rossel
<[email protected]
<mailto:[email protected]>>
i have a small javascript that converts a
n3 graph into a javascript
graph
of objects.
if your problem is related to XML stuff
and such a lib could help, let
me
know.
(it might be interesting to contribute
it directly to exhibit, btw)
On Wed, Jun 5, 2013 at 6:13 PM, Brice
Sommacal <
[email protected]
<mailto:[email protected]>
wrote:
Hello everyone,
I'm facing a
"java.lang.OutOfMemoryError: GC
overhead limit exceeded"
error
and I would like an advice about how
I could optimize my code.
The aim of this method is to run a
SPARQL query, convert it on a XML
format
and then apply a XSL stylesheet[1]
to write a JSON format (readable
by
Exhibit - Scripted [2]).
My piece of code was working
well untill today. (I have been trying
to
query a big model and the query returns
too much results).
This makes my program break.
<quote>
Query queryToExec =
QueryFactory.create(query,
Syntax.syntaxARQ);
QueryExecution qexec =
QueryExecutionFactory.create(*__*queryToExec,
ontoIn);
ResultSet result = null;
BufferedOutputStream buf;
try{
result = qexec.execSelect();
buf = new BufferedOutputStream(new
FileOutputStream(new File(root +
"XML/JSON_XML/"+qNameClass+".*__*xml")));
//Serialization of the resultSet
ResultSetFormatter.**__outputAsXML(buf,
result);
buf.close();
}
catch (Exception e) {
e.printStackTrace();
}
finally{
qexec.close();
}
</quote>
I know that writing XML file use
loads memory....
I was thinking of:
- creating several XML files by
tracing the ResullSetFormatter
memory
usage. (is there possible?)
- avoiding XML intermediate
format and write directly in one or
several
JSON file...
- ...
Is there someone whom find a
way to avoid this kind of error
(without
increasing Xms Xmx) ??
Thanks in advance,
Brice
[1]
http://data-gov.tw.rpi.edu/**__wiki/Sparqlxml2exhibitjson.xsl
<http://data-gov.tw.rpi.edu/**wiki/Sparqlxml2exhibitjson.xsl>__<http://data-gov.tw.rpi.edu/__wiki/Sparqlxml2exhibitjson.xsl
<http://data-gov.tw.rpi.edu/wiki/Sparqlxml2exhibitjson.xsl>__>
[2]
http://www.simile-widgets.org/__**exhibit3/
<http://www.simile-widgets.org/**exhibit3/><http://www.simile-__widgets.org/exhibit3/
<http://www.simile-widgets.org/exhibit3/>>
--
I like: Like Like - The likeliest place on the web<
http://like-like.xenei.com>
Identity:
https://www.identify.nu/user.*__*[email protected]
<https://www.identify.nu/user.**[email protected]><https://__www.identify.nu/[email protected]
<https://www.identify.nu/[email protected]>>
LinkedIn:
http://www.linkedin.com/in/**__claudewarren
<http://www.linkedin.com/in/**claudewarren><http://www.__linkedin.com/in/claudewarren
<http://www.linkedin.com/in/claudewarren>>