Hi Alessandro

Something like this could work:

This suggests to
* provide an MGraph wrapper that skips all triples other than the one need to 
determine the OntologyID
* Use a BufferedInputStream and mark the beginning
* Parse to your MGraphWrapper until you can determine the OntologyID
* throw some exception to stop the parsing
* reset the stream
* process the OntologyID
* If you need to import the parsed ontology you can reuse the resetted stream

Here is how the code might look.

class MyMGraph extends SimpleMGraph {

     String ontologyId;

    @Override
    protected boolean performAdd(Triple triple) {

          //fitler the interesting Triple
          if(triple is interesting){
              super.perfomAdd(triple)
          }
          //check the currently available triples for the Ontology ID
          checkOntologyId(); 
         
         if(ontologyId != null){
             throw new RuntimeException(); //stop importing
         }
         //TODO: add an limit to the triples you read
    }
   
    public getOntologyID(){
        return id
    }


}


If you use a BufferedInputStream you could do the following

BufferedInputStream bIn = new BufferedInputStream(in);
bIn.mark(Integer.MAX_VALUE); //set an appropriate limit
MyMGraph  graph = new MyMGraph();
try {
    parser.parse(graph,inputStream,rdfFormat)
} catch(RuntimeException e){ }
if(graph.getOntologyId() != null){
    bIn.reset(); //reset set the stream to the start
    //now do the logic you need to do
} else { //No OntologyID found
    //do some error handling
}


WDYT
Rupert

On 16.03.2012, at 11:12, Alessandro Adamou wrote:

> One thing that it would be great to do is to detect the ontology ID *before* 
> creating the TripleCollection in Clerezza, so any mappings could be done 
> before storing.
> 
> But I don't know how this can be done with not so much code.
> 
> Perhaps creating an IndexedGraph, exploring its content, then creating the 
> Graph in the TcManager with the same content and the right graph name, then 
> finally clearing the IndexedGraph could work.
> 
> But it still means having twice the resource usage (disk+memory) for a period.
> 
> Alessandro
> 
> 
> On 3/16/12 10:56 AM, Alessandro Adamou wrote:
>> Hi David,
>> 
>> well, I guess that depends pretty much on how heavy the usage of OntoNet is 
>> in your Stanbol installation.
>> 
>> Those are graphs created when OntoNet has to load an ontology from its 
>> content rather than from a Web URI, so it cannot know the ontology ID 
>> earlier.
>> 
>> This happens e.g. by POSTing the ontology as the payload or by passing a 
>> GraphContentInputSource to the Java API.
>> 
>> Now I do not know why these graphs are created (perhaps the refactor engine 
>> could be loading some), but I do know that a Clerezza graph in Jena TDB 
>> occupies a LOT of disk space.
>> 
>> Suffice it to say that my bundled had stored nine graphs of <100 triples 
>> each. Their disk space was about 1.8 GB, but when I tried to make a zipfile 
>> out of it, it came out as about 2MB!
>> 
>> Alessandro
>> 
>> 
>> On 3/16/12 10:30 AM, David Riccitelli wrote:
>>> Dears,
>>> 
>>> As I ran into disk issues, I found that this folder:
>>>  sling/felix/bundleXXX/data/tdb-data/mgraph
>>> 
>>> where XX is the bundle of:
>>>  Clerezza - SCB Jena TDB Storage Provider
>>> org.apache.clerezza.rdf.jena.tdb.storage
>>> 
>>> took almost 70 gbytes of disk space (then the disk space has been
>>> exhausted).
>>> 
>>> These are some of the files I found inside:
>>> 193M ./ontonet%3A%3Ainputstream%3Aontology889
>>> 193M ./ontonet%3A%3Ainputstream%3Aontology1041
>>> 193M ./ontonet%3A%3Ainputstream%3Aontology395
>>> 193M ./ontonet%3A%3Ainputstream%3Aontology363
>>> 193M ./ontonet%3A%3Ainputstream%3Aontology661
>>> 193M ./ontonet%3A%3Ainputstream%3Aontology786
>>> 193M ./ontonet%3A%3Ainputstream%3Aontology608
>>> 193M ./ontonet%3A%3Ainputstream%3Aontology213
>>> 193M ./ontonet%3A%3Ainputstream%3Aontology188
>>> 193M ./ontonet%3A%3Ainputstream%3Aontology602
>>> 
>>> 
>>> Any clues?
>>> 
>>> Thanks,
>>> David Riccitelli
>>> 
>>> ********************************************************************************
>>>  
>>> InsideOut10 s.r.l.
>>> P.IVA: IT-11381771002
>>> Fax: +39 0110708239
>>> ---
>>> LinkedIn: http://it.linkedin.com/in/riccitelli
>>> Twitter: ziodave
>>> ---
>>> Layar Partner 
>>> Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
>>> ********************************************************************************
>>>  
>>> 
>> 
>> 
> 
> 
> -- 
> M.Sc. Alessandro Adamou
> 
> Alma Mater Studiorum - Università di Bologna
> Department of Computer Science
> Mura Anteo Zamboni 7, 40127 Bologna - Italy
> 
> Semantic Technology Laboratory (STLab)
> Institute for Cognitive Science and Technology (ISTC)
> National Research Council (CNR)
> Via Nomentana 56, 00161 Rome - Italy
> 
> 
> "I will give you everything, so long as you do not demand anything."
> (Ettore Petrolini, 1930)
> 
> Not sent from my iSnobTechDevice
> 

Reply via email to