On 08/07/14 16:56, Pearson, Stephen (HP Cloud, Bristol) wrote:
I'm working with a medium sized dataset of around 8 million triples, and
I'm using fuseki to query it via an inference model (either RDFS or OWL
micro). This works, but I'm looking to boost performance by pre-computing
the inferences, storing them in a named graph and using
tdb:unionDefaultGraph to merge them at run time. I'll then have the
option of recomputing the inferences from scratch whenever the schema
changes; The code below takes under 2 minutes to run which is ok for my
use case provided I don't have to do it every time I restart the server.
I'm therefore looking for a way to take a reasoner and extract just the
new inferences from the resulting InfModel.
Code:
// Assume tdbModel loaded from TDB
Model schema = ModelFactory.createDefaultModel();
schema.read("schema.ttl", "TURTLE");
Model unionModel = ModelFactory.createUnion(tdbModel, schema);
OntModel ont =
ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM_MICRO_RULE_INF);
ont.add(unionModel);
//ont.write(System.out, "TURTLE");
ont.writeAll(System.out, "TURTLE");
System.out.println("ont triples: " + ont.size());
I suppose I could write out the entire model + inferences, but that can
take a while. The Jena API must know which triples are inferred in order
for ont.write() to behave differently from ont.writeAll(), but I can't see
how to filter them out from the information in the Javadocs.
Just write out the whole model, there will be a lot more inferred
triples than there are base triples so you won't be saving much by
omitting the base ones.
The issue is that the reasoners in general, and OWL_Micro specifically,
use a mix of forward and backward deductions.
The forward deductions are indeed stored separately and can be obtained
by getDeductionsModel.
However, the backward deductions are only computed on demand in response
to queries. Some of those are indirectly cached in the backward
reasoner's tabled predicates but others are never cached. So the only
way to obtain all deductions to ask the most general query.
There are a few things you can do which might help performance.
First, you could materialize all the triples before you write them out.
The writer will make a lot of separate calls so anything that isn't
cached may be recomputed. So try something like:
Model myMaterializedModel = ModelFactory.createDefaultModel();
myMaterializedModel.add ( ont );
Then you can write out myMaterializedModel or if you really to you could
remove the base models before doing so:
myMaterializedModel.remove( schema );
myMaterializedModel.remove( tdbModel );
Second, given that the reasoning is being done in memory you may find it
more efficient to copy tdbModel into a memory model first, then wrap the
reasoner round that.
Third, if for your purposes there are only certain types of queries you
need to run you may chose to materialize only some of the inferences.
For example if you only care about inferred types you could perform more
restricted materializations such as:
myMaterializedModel.add( ont.listStatements(null, RDF.type,
(RDFNode)null) );
Dave