Hi Alessandro
Something like this could work:
This suggests to
* provide an MGraph wrapper that skips all triples other than the one need to
determine the OntologyID
* Use a BufferedInputStream and mark the beginning
* Parse to your MGraphWrapper until you can determine the OntologyID
* throw some exception to stop the parsing
* reset the stream
* process the OntologyID
* If you need to import the parsed ontology you can reuse the resetted stream
Here is how the code might look.
class MyMGraph extends SimpleMGraph {
String ontologyId;
@Override
protected boolean performAdd(Triple triple) {
//fitler the interesting Triple
if(triple is interesting){
super.perfomAdd(triple)
}
//check the currently available triples for the Ontology ID
checkOntologyId();
if(ontologyId != null){
throw new RuntimeException(); //stop importing
}
//TODO: add an limit to the triples you read
}
public getOntologyID(){
return id
}
}
If you use a BufferedInputStream you could do the following
BufferedInputStream bIn = new BufferedInputStream(in);
bIn.mark(Integer.MAX_VALUE); //set an appropriate limit
MyMGraph graph = new MyMGraph();
try {
parser.parse(graph,inputStream,rdfFormat)
} catch(RuntimeException e){ }
if(graph.getOntologyId() != null){
bIn.reset(); //reset set the stream to the start
//now do the logic you need to do
} else { //No OntologyID found
//do some error handling
}
WDYT
Rupert
On 16.03.2012, at 11:12, Alessandro Adamou wrote:
> One thing that it would be great to do is to detect the ontology ID *before*
> creating the TripleCollection in Clerezza, so any mappings could be done
> before storing.
>
> But I don't know how this can be done with not so much code.
>
> Perhaps creating an IndexedGraph, exploring its content, then creating the
> Graph in the TcManager with the same content and the right graph name, then
> finally clearing the IndexedGraph could work.
>
> But it still means having twice the resource usage (disk+memory) for a period.
>
> Alessandro
>
>
> On 3/16/12 10:56 AM, Alessandro Adamou wrote:
>> Hi David,
>>
>> well, I guess that depends pretty much on how heavy the usage of OntoNet is
>> in your Stanbol installation.
>>
>> Those are graphs created when OntoNet has to load an ontology from its
>> content rather than from a Web URI, so it cannot know the ontology ID
>> earlier.
>>
>> This happens e.g. by POSTing the ontology as the payload or by passing a
>> GraphContentInputSource to the Java API.
>>
>> Now I do not know why these graphs are created (perhaps the refactor engine
>> could be loading some), but I do know that a Clerezza graph in Jena TDB
>> occupies a LOT of disk space.
>>
>> Suffice it to say that my bundled had stored nine graphs of <100 triples
>> each. Their disk space was about 1.8 GB, but when I tried to make a zipfile
>> out of it, it came out as about 2MB!
>>
>> Alessandro
>>
>>
>> On 3/16/12 10:30 AM, David Riccitelli wrote:
>>> Dears,
>>>
>>> As I ran into disk issues, I found that this folder:
>>> sling/felix/bundleXXX/data/tdb-data/mgraph
>>>
>>> where XX is the bundle of:
>>> Clerezza - SCB Jena TDB Storage Provider
>>> org.apache.clerezza.rdf.jena.tdb.storage
>>>
>>> took almost 70 gbytes of disk space (then the disk space has been
>>> exhausted).
>>>
>>> These are some of the files I found inside:
>>> 193M ./ontonet%3A%3Ainputstream%3Aontology889
>>> 193M ./ontonet%3A%3Ainputstream%3Aontology1041
>>> 193M ./ontonet%3A%3Ainputstream%3Aontology395
>>> 193M ./ontonet%3A%3Ainputstream%3Aontology363
>>> 193M ./ontonet%3A%3Ainputstream%3Aontology661
>>> 193M ./ontonet%3A%3Ainputstream%3Aontology786
>>> 193M ./ontonet%3A%3Ainputstream%3Aontology608
>>> 193M ./ontonet%3A%3Ainputstream%3Aontology213
>>> 193M ./ontonet%3A%3Ainputstream%3Aontology188
>>> 193M ./ontonet%3A%3Ainputstream%3Aontology602
>>>
>>>
>>> Any clues?
>>>
>>> Thanks,
>>> David Riccitelli
>>>
>>> ********************************************************************************
>>>
>>> InsideOut10 s.r.l.
>>> P.IVA: IT-11381771002
>>> Fax: +39 0110708239
>>> ---
>>> LinkedIn: http://it.linkedin.com/in/riccitelli
>>> Twitter: ziodave
>>> ---
>>> Layar Partner
>>> Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
>>> ********************************************************************************
>>>
>>>
>>
>>
>
>
> --
> M.Sc. Alessandro Adamou
>
> Alma Mater Studiorum - Università di Bologna
> Department of Computer Science
> Mura Anteo Zamboni 7, 40127 Bologna - Italy
>
> Semantic Technology Laboratory (STLab)
> Institute for Cognitive Science and Technology (ISTC)
> National Research Council (CNR)
> Via Nomentana 56, 00161 Rome - Italy
>
>
> "I will give you everything, so long as you do not demand anything."
> (Ettore Petrolini, 1930)
>
> Not sent from my iSnobTechDevice
>