On 08/07/2021 10:05, Simon Gray wrote:
So I have a follow-up question...
What I really want is an updatable graph that persists on disk as TDB and then
an expanded view that contains all of the inferred triples too - this may very
well be an in-memory graph. Basically, I want to be able to add to the
underlying TDB graph and then rely on inference to create additional triples,
keeping a separation. I am not interested in persisting any inferred triples to
a new TDB like some of the replies here assume. To me, the advantage of
inference is having the flexibility of expanding the dataset on-demand while
having a separation between man-made, curated triples and some varying set of
inferred triples.
Is this at all possible to do with Jena?
Possible but not necessarily performant or convenient.
A major limitation of the Jena inference support is that it is in-memory
only. There's no mechanism to persist/reload the internal state of the
inference engines, you can only query for the resulting materialized
triples and persist those as discussed on this thread. And the inference
scaling is limited to memory, whether or not the base data is held on
scalable persistent storage.
So you *can* create an inference model over a TDB model and updates made
through the inference model will be persisted by the TDB base model, and
also result in new inferences. However, the inference engine will be
querying TDB for every query made by the rules. The performance of that
will be much worse than performance of a purely in-memory configuration.
When you first start up your service the first query (or any explicit
initial prepare() call) will be very slow. After that, once the forward
inferences have been completed, performance should be better but still
significantly slower than a purely in-memory solution.
Depending on your application structure and scale of data you may be
able to run with a dual in-memory-with-reasoning and
copied-to-TDB-for-persistence architecture. Where on start up you copy
the TDB data to the memory InfModel once and updates are written to both
copies. That would still have high start up latency but not as high.
What you want to do is entirely reasonable but not well supported by
Jena inference as it stands.
Dave
Den 5. jul. 2021 kl. 10.38 skrev Dave Reynolds <[email protected]>:
On 05/07/2021 08:03, Simon Gray wrote:
Thank you for that answer, Dave! I think this provides the missing link in my
understanding of the matter.
Is there a single method call to use when copying the inference model to a
plain model or do I need to make copies of every triple myself and add them to
a new model?
Model.add does it for you, so you should just need something like like:
plain.add( infModel );
and it will enumerate all triples and add them to the new model. Potentially
taking some time!
Dave
Den 3. jul. 2021 kl. 18.34 skrev Dave Reynolds <[email protected]>:
On 02/07/2021 13:29, Simon Gray wrote:
Hmm… I am not sure how my rules are modeled. I just use the built-in
OWL_MEM_MICRO_RULE_INF OntModelSpec.
Anyway, my question is still this: how do I get all of those inferences
computed *before* I start querying the Model. It’s great if I can just store
them later, but I still need to *compute* them before I can think about
persisting anything. Running a single query doesn’t seem to compute them all,
just relevant ones to that specific query… I think?
Short answer is there's no built in way to precompute everything that's
precomputable for the OWL reasoners other than that which the others have
pointed out - copy the inferred model.
The OWL rules use a mix of forward and backward reasoning. The forward
reasoning can all be invoked in one go via prepare() but the backward reasoning
is mostly done on demand. Some of the backward rules are tabled/memoized so
once they've been run once future runs are supposed to be quicker. Others are
always run on demand.
If you have a few particular query patterns then to warm up the relevant
memoization run those queries.
The most comprehensive way to ensure everything has been computed is to copy
the model to a plain model (in memory or persistent). That copy is essentially
running the query (?s ?p ?o) and will compute everything the rules can reach.
After that the inference model is as warm as it's going to get. But since that
that point you've already materialized everything then might as well keep the
materialized copy as the others have said.
There'd be nothing to doing the general query (e.g. via an unbounded
listStatements()) call and throwing the results away. That *could* be
beneficial if the materialized model is too big but the tabling/memoization is
proving useful and smaller - but no guarantees.
Dave
Den 2. jul. 2021 kl. 14.06 skrev Lorenz Buehmann
<[email protected]>:
But can't you do this inference just once and then somewhere store those
inferences? Next time you can simply load the inferred model instead of the raw
dataset. It is not specific to TDB, you can load dataset A, compute the
inferred model in a slow process once, materialize it as dataset B, and later
on always work on dataset B - this is standard forward chaining with writing
the data back to disk or database. Can you try this procedure, maybe it works
for you?
Indeed this wont work if your rules are currently modeled as backward chaining
rules as those are computed at query time always.
On 02.07.21 13:37, Simon Gray wrote:
Thank you Lorenz, although this seems to be a reply to my side comment about
TDB rather than the question I had, right?
The main issue right now is that I would like to use inferencing to get e.g.
inverse relations, but doing this is very slow the first time a query is run,
likely due to some preprocessing step that needs to run first. I would like to
run the preprocessing step in advance rather than running it implicitly.
Den 2. jul. 2021 kl. 13.30 skrev Lorenz Buehmann
<[email protected]>:
you can just add the inferred model to the dataset, i.e. add all triple to your
TDB. Then you can disable the reasoner afterwards or just omit the rules that
you do not need anymore
On 02.07.21 13:13, Simon Gray wrote:
Hi there,
I’m using Apache Jena from Clojure to create new home for the Danish WordNet. I
use the Arachne Aristotle library + some additional Java interop code of my own.
I would like to use OWL inferencing to query e.g transitive or inverse
relations. This does seem to work fine although I’ve only tried using the
supplied in-memory model for now (and it looks like I will have to create my
own instance of a ModelMaker to integrate with TDB 1 or 2).
However, the first query always seems to run really, really slow. Is there any
way to precompute inferred relations so that I don’t have to wait? I’ve tried
calling `rebind` and `prepare`, but they don’t seem to do anything.
Kind regards,
Simon Gray
Research Officer
Centre for Language Technology, University of Copenhagen