Re: Precomputing OWL inferences

Dave Reynolds Fri, 09 Jul 2021 06:37:52 -0700

On 08/07/2021 10:05, Simon Gray wrote:

So I have a follow-up question...


What I really want is an updatable graph that persists on disk as TDB and then 
an expanded view that contains all of the inferred triples too - this may very 
well be an in-memory graph. Basically, I want to be able to add to the 
underlying TDB graph and then rely on inference to create additional triples, 
keeping a separation. I am not interested in persisting any inferred triples to 
a new TDB like some of the replies here assume. To me, the advantage of 
inference is having the flexibility of expanding the dataset on-demand while 
having a separation between man-made, curated triples and some varying set of 
inferred triples.

Is this at all possible to do with Jena?


Possible but not necessarily performant or convenient.

A major limitation of the Jena inference support is that it is in-memoryonly. There's no mechanism to persist/reload the internal state of theinference engines, you can only query for the resulting materializedtriples and persist those as discussed on this thread. And the inferencescaling is limited to memory, whether or not the base data is held onscalable persistent storage.

So you *can* create an inference model over a TDB model and updates madethrough the inference model will be persisted by the TDB base model, andalso result in new inferences. However, the inference engine will bequerying TDB for every query made by the rules. The performance of thatwill be much worse than performance of a purely in-memory configuration.When you first start up your service the first query (or any explicitinitial prepare() call) will be very slow. After that, once the forwardinferences have been completed, performance should be better but stillsignificantly slower than a purely in-memory solution.

Depending on your application structure and scale of data you may beable to run with a dual in-memory-with-reasoning andcopied-to-TDB-for-persistence architecture. Where on start up you copythe TDB data to the memory InfModel once and updates are written to bothcopies. That would still have high start up latency but not as high.

What you want to do is entirely reasonable but not well supported byJena inference as it stands.


Dave

Den 5. jul. 2021 kl. 10.38 skrev Dave Reynolds <[email protected]>:

On 05/07/2021 08:03, Simon Gray wrote:

Thank you for that answer, Dave! I think this provides the missing link in my 
understanding of the matter.
Is there a single method call to use when copying the inference model to a 
plain model or do I need to make copies of every triple myself and add them to 
a new model?


Model.add does it for you, so you should just need something like like:

    plain.add( infModel );

and it will enumerate all triples and add them to the new model. Potentially 
taking some time!

Dave

Den 3. jul. 2021 kl. 18.34 skrev Dave Reynolds <[email protected]>:


On 02/07/2021 13:29, Simon Gray wrote:

Hmm… I am not sure how my rules are modeled. I just use the built-in 
OWL_MEM_MICRO_RULE_INF OntModelSpec.
Anyway, my question is still this: how do I get all of those inferences 
computed *before* I start querying the Model. It’s great if I can just store 
them later, but I still need to *compute* them before I can think about 
persisting anything. Running a single query doesn’t seem to compute them all, 
just relevant ones to that specific query… I think?


Short answer is there's no built in way to precompute everything that's 
precomputable for the OWL reasoners other than that which the others have 
pointed out - copy the inferred model.

The OWL rules use a mix of forward and backward reasoning. The forward 
reasoning can all be invoked in one go via prepare() but the backward reasoning 
is mostly done on demand. Some of the backward rules are tabled/memoized so 
once they've been run once future runs are supposed to be quicker. Others are 
always run on demand.

If you have a few particular query patterns then to warm up the relevant 
memoization run those queries.

The most comprehensive way to ensure everything has been computed is to copy 
the model to a plain model (in memory or persistent). That copy is essentially 
running the query (?s ?p ?o) and will compute everything the rules can reach. 
After that the inference model is as warm as it's going to get. But since that 
that point you've already materialized everything then might as well keep the 
materialized copy as the others have said.

There'd be nothing to doing the general query (e.g. via an unbounded 
listStatements()) call and throwing the results away. That *could* be 
beneficial if the materialized model is too big but the tabling/memoization is 
proving useful and smaller - but no guarantees.

Dave

Den 2. jul. 2021 kl. 14.06 skrev Lorenz Buehmann 
<[email protected]>:

But can't you do this inference just once and then somewhere store those 
inferences? Next time you can simply load the inferred model instead of the raw 
dataset. It is not specific to TDB, you can load dataset A, compute the 
inferred model in a slow process once, materialize it as dataset B, and later 
on always work on dataset B - this is standard forward chaining with writing 
the data back to disk or database. Can you try this procedure, maybe it works 
for you?

Indeed this wont work if your rules are currently modeled as backward chaining 
rules as those are computed at query time always.


On 02.07.21 13:37, Simon Gray wrote:

Thank you Lorenz, although this seems to be a reply to my side comment about 
TDB rather than the question I had, right?

The main issue right now is that I would like to use inferencing to get e.g. 
inverse relations, but doing this is very slow the first time a query is run, 
likely due to some preprocessing step that needs to run first. I would like to 
run the preprocessing step in advance rather than running it implicitly.

Den 2. jul. 2021 kl. 13.30 skrev Lorenz Buehmann 
<[email protected]>:

you can just add the inferred model to the dataset, i.e. add all triple to your 
TDB. Then you can disable the reasoner afterwards or just omit the rules that 
you do not need anymore

On 02.07.21 13:13, Simon Gray wrote:

Hi there,

I’m using Apache Jena from Clojure to create new home for the Danish WordNet. I 
use the Arachne Aristotle library + some additional Java interop code of my own.

I would like to use OWL inferencing to query e.g transitive or inverse 
relations. This does seem to work fine although I’ve only tried using the 
supplied in-memory model for now (and it looks like I will have to create my 
own instance of a ModelMaker to integrate with TDB 1 or 2).

However, the first query always seems to run really, really slow. Is there any 
way to precompute inferred relations so that I don’t have to wait? I’ve tried 
calling `rebind` and `prepare`, but they don’t seem to do anything.

Kind regards,

Simon Gray
Research Officer
Centre for Language Technology, University of Copenhagen

Re: Precomputing OWL inferences

Reply via email to