Thank you Rupert, this looks like a good idea indeed!
There are some policies to be decided first, e.g. if we discover that a
graph with that name is already stored, we have to decide whether to
replace, add, merge etc. (see STANBOL-426) . and this also depends on
what artifact has the "ownership" of that graph (a scope/space, a
session, or nobody).
But it will be much easier to understand once owl:versionIRI support is
complete in STANBOL-524
But this is for later. I will create a new ticket and link it with the
above and STANBOL-518, then post your code sample there.
For now, it should be enough to solve the problem on the refactor engine
level.
Best,
Alessandro
On 3/16/12 11:50 AM, Rupert Westenthaler wrote:
Hi Alessandro
Something like this could work:
This suggests to
* provide an MGraph wrapper that skips all triples other than the one need to
determine the OntologyID
* Use a BufferedInputStream and mark the beginning
* Parse to your MGraphWrapper until you can determine the OntologyID
* throw some exception to stop the parsing
* reset the stream
* process the OntologyID
* If you need to import the parsed ontology you can reuse the resetted stream
Here is how the code might look.
class MyMGraph extends SimpleMGraph {
String ontologyId;
@Override
protected boolean performAdd(Triple triple) {
//fitler the interesting Triple
if(triple is interesting){
super.perfomAdd(triple)
}
//check the currently available triples for the Ontology ID
checkOntologyId();
if(ontologyId != null){
throw new RuntimeException(); //stop importing
}
//TODO: add an limit to the triples you read
}
public getOntologyID(){
return id
}
}
If you use a BufferedInputStream you could do the following
BufferedInputStream bIn = new BufferedInputStream(in);
bIn.mark(Integer.MAX_VALUE); //set an appropriate limit
MyMGraph graph = new MyMGraph();
try {
parser.parse(graph,inputStream,rdfFormat)
} catch(RuntimeException e){ }
if(graph.getOntologyId() != null){
bIn.reset(); //reset set the stream to the start
//now do the logic you need to do
} else { //No OntologyID found
//do some error handling
}
WDYT
Rupert
On 16.03.2012, at 11:12, Alessandro Adamou wrote:
One thing that it would be great to do is to detect the ontology ID *before*
creating the TripleCollection in Clerezza, so any mappings could be done before
storing.
But I don't know how this can be done with not so much code.
Perhaps creating an IndexedGraph, exploring its content, then creating the
Graph in the TcManager with the same content and the right graph name, then
finally clearing the IndexedGraph could work.
But it still means having twice the resource usage (disk+memory) for a period.
Alessandro
On 3/16/12 10:56 AM, Alessandro Adamou wrote:
Hi David,
well, I guess that depends pretty much on how heavy the usage of OntoNet is in
your Stanbol installation.
Those are graphs created when OntoNet has to load an ontology from its content
rather than from a Web URI, so it cannot know the ontology ID earlier.
This happens e.g. by POSTing the ontology as the payload or by passing a
GraphContentInputSource to the Java API.
Now I do not know why these graphs are created (perhaps the refactor engine
could be loading some), but I do know that a Clerezza graph in Jena TDB
occupies a LOT of disk space.
Suffice it to say that my bundled had stored nine graphs of<100 triples each.
Their disk space was about 1.8 GB, but when I tried to make a zipfile out of it,
it came out as about 2MB!
Alessandro
On 3/16/12 10:30 AM, David Riccitelli wrote:
Dears,
As I ran into disk issues, I found that this folder:
sling/felix/bundleXXX/data/tdb-data/mgraph
where XX is the bundle of:
Clerezza - SCB Jena TDB Storage Provider
org.apache.clerezza.rdf.jena.tdb.storage
took almost 70 gbytes of disk space (then the disk space has been
exhausted).
These are some of the files I found inside:
193M ./ontonet%3A%3Ainputstream%3Aontology889
193M ./ontonet%3A%3Ainputstream%3Aontology1041
193M ./ontonet%3A%3Ainputstream%3Aontology395
193M ./ontonet%3A%3Ainputstream%3Aontology363
193M ./ontonet%3A%3Ainputstream%3Aontology661
193M ./ontonet%3A%3Ainputstream%3Aontology786
193M ./ontonet%3A%3Ainputstream%3Aontology608
193M ./ontonet%3A%3Ainputstream%3Aontology213
193M ./ontonet%3A%3Ainputstream%3Aontology188
193M ./ontonet%3A%3Ainputstream%3Aontology602
Any clues?
Thanks,
David Riccitelli
********************************************************************************
InsideOut10 s.r.l.
P.IVA: IT-11381771002
Fax: +39 0110708239
---
LinkedIn: http://it.linkedin.com/in/riccitelli
Twitter: ziodave
---
Layar Partner
Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
********************************************************************************
--
M.Sc. Alessandro Adamou
Alma Mater Studiorum - Università di Bologna
Department of Computer Science
Mura Anteo Zamboni 7, 40127 Bologna - Italy
Semantic Technology Laboratory (STLab)
Institute for Cognitive Science and Technology (ISTC)
National Research Council (CNR)
Via Nomentana 56, 00161 Rome - Italy
"I will give you everything, so long as you do not demand anything."
(Ettore Petrolini, 1930)
Not sent from my iSnobTechDevice
--
M.Sc. Alessandro Adamou
Alma Mater Studiorum - Università di Bologna
Department of Computer Science
Mura Anteo Zamboni 7, 40127 Bologna - Italy
Semantic Technology Laboratory (STLab)
Institute for Cognitive Science and Technology (ISTC)
National Research Council (CNR)
Via Nomentana 56, 00161 Rome - Italy
"I will give you everything, so long as you do not demand anything."
(Ettore Petrolini, 1930)
Not sent from my iSnobTechDevice