Hi Rupert,
I've been trying to implement your proposed solution for the Ontology ID
lookahead with the MGraph wrapper.
I'm trying to make it simple now, then I will need to detect the
[ontologyIRI, versionIRI] pair
However, BufferedInputStream.mark(int) does not seem to set the read
limit for me. No matter what value I set (even -1), Parser.parse()
always goes through the whole graph, and when I try to reset() it after
finding the ontologyID I always get an IOException("Stream closed")
I tried values much greater and much smaller than the file size in
bytes, and tried to move the triple early and late in the file, no dice.
Perhaps I should just set a limit on the triples instead, but I wouldn't
want to read through a 100MiB file just to use the first 100 triples for
guessing the ID. However, this could be inevitable since most formats
require to read the last chunk of a file in order to "close" the RDF
code (such as a </rdf:RDF> tag or so), but perhaps a SAX parser could
work anyway?
any clue?
Alessandro
On 3/16/12 11:50 AM, Rupert Westenthaler wrote:
Hi Alessandro
Something like this could work:
This suggests to
* provide an MGraph wrapper that skips all triples other than the one need to
determine the OntologyID
* Use a BufferedInputStream and mark the beginning
* Parse to your MGraphWrapper until you can determine the OntologyID
* throw some exception to stop the parsing
* reset the stream
* process the OntologyID
* If you need to import the parsed ontology you can reuse the resetted stream
Here is how the code might look.
class MyMGraph extends SimpleMGraph {
String ontologyId;
@Override
protected boolean performAdd(Triple triple) {
//fitler the interesting Triple
if(triple is interesting){
super.perfomAdd(triple)
}
//check the currently available triples for the Ontology ID
checkOntologyId();
if(ontologyId != null){
throw new RuntimeException(); //stop importing
}
//TODO: add an limit to the triples you read
}
public getOntologyID(){
return id
}
}
If you use a BufferedInputStream you could do the following
BufferedInputStream bIn = new BufferedInputStream(in);
bIn.mark(Integer.MAX_VALUE); //set an appropriate limit
MyMGraph graph = new MyMGraph();
try {
parser.parse(graph,inputStream,rdfFormat)
} catch(RuntimeException e){ }
if(graph.getOntologyId() != null){
bIn.reset(); //reset set the stream to the start
//now do the logic you need to do
} else { //No OntologyID found
//do some error handling
}
WDYT
Rupert
--
M.Sc. Alessandro Adamou
Alma Mater Studiorum - Università di Bologna
Department of Computer Science
Mura Anteo Zamboni 7, 40127 Bologna - Italy
Semantic Technology Laboratory (STLab)
Institute for Cognitive Science and Technology (ISTC)
National Research Council (CNR)
Via Nomentana 56, 00161 Rome - Italy
"I will give you everything, so long as you do not demand anything."
(Ettore Petrolini, 1917)
Not sent from my iSnobTechDevice