Re: Fuskei2 configuration, TDB2 data, Inferencing with ontologies, Persisting named graphs upon server restart

Andy Seaborne Sat, 09 Feb 2019 09:54:02 -0800



On 04/02/2019 12:31, Pierre Grenon wrote:

Hi,

following up after going through my attempts more systematically again. I'm 
trying to be as specific and clear as I can. Any feedback most appreciated.

Many thanks,
Pierre

1. It is possible to have a configuration file in which data is loaded into a 
TDB and inferences are ran over this data. In this case:

1.a Data in named graphs created using SPARQL Update into a TDB dataset 
persists upon restart.

Data must be loaded through the inference graph for the inferencer tonotice the change.

So the SPARQL updates can't create a new graph. Assemblers have a fixedconfiguration.

(You could have one graph per database and upload new assemblers whileFuseki is running..)


1.b Assertional data in these named graphs is immediately available to the 
reasoning endpoint without server restart.

1.c Inference on data loaded using SPARQL Update requires restart of the server 
after upload.

1.d CLEAR ALL in the TDB dataset endpoint requires server restart to have the 
inference dataset emptied. (Queries to the reasoning endpoint for either 
assertional or inferred data both return the same results as prior to clearing 
the TDB dataset.)

Same general point - if you manipulate the database directly, theinference code doesn't know a change has happened or what has changed.

2. TDB2 does not allow this --- or is it, at the moment only? As per OP in this 
thread, the configuration adapted to TDB2 breaks. Based on Andy's response, 
this may be caused by Bug Jena-1633. Would fixing the bug be enough to allow 
for the configuration using TDB2?


JENA-1663.

3. Inference datasets do not synch with the underlying TDB(2) datasets (1.b and 
1.c in virtue of the in memory nature of inference models and the way 
configuration files are handled as per Andy's and ajs6f 's responses).

In view of this, however, 1.b is really weird.

4. Adding a service update method to the reasoning service does not seem to 
allow updating the inference dataset. Sending SPARQL Update to the inference 
endpoint does not result in either additional assertional or inferred data. 
(Although, per 1.b, asserted data is returned when the SPARQL Update is sent to 
the TDB endpoint.)


The base graph of the inference model is updated.

But you have that set to <urn:x-arq:UnionGraph>.

That applies to SPARQL query - the updates will have gone to the realdefault graph but that is hidden by your setup.



Question:

What is the prescribed way of keeping disc data and inference datasets in synch?


Update via the inference model.
Don't wire it to the union graph.


Is it:

P1 - upon SPARQL Update to disc data, restart server (and reinitialise 
inference dataset)?
This makes it difficult to manage successive updates, especially when there may 
be dependencies between states them, e.g., in in order to make update 2 I need 
to have done update 1, I need to restart after update 1.

Given that TDB only works at the moment, what is the 'transactional' meaning of 
having to do this?

P2 - upon SPARQL Update to disc data, SPARQL Update inference dataset. Is it 
possible to update the inference dataset? In that case, is it possible to 
guarantee that the two datasets are in synch? Does TDB versus TDB2 matter?

5. Not for self that property chains are not supported by the OWLFBReasoner.


##### TDB Configuration
##### From:

https://stackoverflow.com/questions/47568703/named-graphs-v-default-graph-behaviour-in-apache-jena-fuseki
@prefix :      <http://base/#> .
@prefix tdb:   <http://jena.hpl.hp.com/2008/tdb#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ja:    <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .

# TDB
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .


# Service 1: Dataset endpoint (no reasoning)
:dataService a fuseki:Service ;
   fuseki:name           "tdbEnpointTDBB" ;
   fuseki:serviceQuery   "sparql", "query" ;
   fuseki:serviceUpdate  "update" ;
   fuseki:dataset        :tdbDataset ;
.

# Service 2: Reasoning endpoint
:reasoningService a fuseki:Service ;
   fuseki:dataset                 :infDataset ;
   fuseki:name                    "reasoningEndpointTDBB" ;
   fuseki:serviceQuery            "query", "sparql" ;
   fuseki:serviceReadGraphStore   "get" ;
.

# Inference dataset
:infDataset rdf:type ja:RDFDataset ;
             ja:defaultGraph :infModel ;
.

# Inference model
:infModel a ja:InfModel ;
            ja:baseModel :g ;

            ja:reasoner [
               ja:reasonerURL <http://jena.hpl.hp.com/2003/OWLFBRuleReasoner> ;
            ] ;
.

# Intermediate graph referencing the default union graph
:g rdf:type tdb:GraphTDB ;
    tdb:dataset :tdbDataset ;
    tdb:graphName <urn:x-arq:UnionGraph> ;
.

# The location of the TDB dataset
:tdbDataset rdf:type tdb:DatasetTDB ;
         tdb:location  "C:\\dev\\apache-jena-fuseki-3.8.0\\run/databases/tdbB" ;
             tdb:unionDefaultGraph true ;
.

From: Pierre Grenon
Sent: 01 February 2019 15:07
To: 'users@jena.apache.org'
Subject: RE: Fuskei2 configuration, TDB2 data, Inferencing with ontologies, 
Persisting named graphs upon server restart


I'll address you two, fine gentlemen, at once if that's OK.

On 31/01/2019 17:57, ajs6f wrote:

2/ It is not possible in an assembler/Fuseki configuration file, to create a 
new named graph and have a another inference graph put around that new graph at 
runtime.


Just to pull on one of these threads, my understanding is that this essentially because 
the assembler system works only by names. IOW, there's no such thing as a 
"variable", and a blank node doesn't function as a slot (as it might in a 
SPARQL query), just as an nameless node. So you have to know the specific name of any 
specific named graph to which you want to refer. A named graph that doesn't yet exist and 
may have any name at all when it does obviously doesn't fit into that.


I find this difficult to follow. By name you mean a value to ja:graphName so 
something like <urn:my:beautiful:graph>?

I have tried a configuration in which I was defining graphs.

<#graph_umb> rdf:type tdb2:GraphTDB ;
   tdb2:dataset :datasetTDB2 ;
   ja:graphName <urn:mad:bro> .

Then I'd load into that graph.

Again, I haven't found a configuration that allowed me to also define an 
inference engine and keep the content of these graphs.

I will retry and try to post files for comments, unless you can come up with a 
minimal example that would save both save time and help preserve sanity.

Andy and other more knowledgeable people: is that correct?


The issue is that the assembler runs once at the start, builds some Java
structures based on that and does not get invoked when the new graph is
created later.


To some extent, it would be possible to live with predefined graphs in the 
config file. This would work for ontologies and reference data that doesn't 
change.

For data, in particular the type of data with lots of numbers and that corresponds to 
daily operation data, it might be infeasible to predefine graph names unless you can 
declare some sorts of template graphs names (e.g., 
<urn:data:icecream:[FLAVOUR]:[YYYMMDDD]>) which sounds like a stretch. 
Alternatively, we could use a rolling predefined graph and save with a specific name 
as archive, then clear and load new data on a daily basis. I think this is a 
different issue though.

The issue is also that the union graph is partition - if a single
concrete graph were used, it might well work.


I'm not sure I follow this. Can you show an example of a config file that makes 
that partitioning?

I haven't worked out the other details like why persistence isn't
happening. Might be related to a union graph. Might be update
happening going around the inference graph.


Hope the previous message helped clarifying the issue.

As a follow up too, I'm asked if it is possible to save to disc any named graph 
created in memory before shutting down the server and if that would be a work 
around.

with many thanks and kind regards,
Pierre

THIS E-MAIL MAY CONTAIN CONFIDENTIAL AND/OR PRIVILEGED INFORMATION.
IF YOU ARE NOT THE INTENDED RECIPIENT (OR HAVE RECEIVED THIS E-MAIL
IN ERROR) PLEASE NOTIFY THE SENDER IMMEDIATELY AND DESTROY THIS
E-MAIL. ANY UNAUTHORISED COPYING, DISCLOSURE OR DISTRIBUTION OF THE
MATERIAL IN THIS E-MAIL IS STRICTLY FORBIDDEN.

IN ACCORDANCE WITH MIFID II RULES ON INDUCEMENTS, THE FIRM'S EMPLOYEES
MAY ATTEND CORPORATE ACCESS EVENTS (DEFINED IN THE FCA HANDBOOK AS
"THE SERVICE OF ARRANGING OR BRINGING ABOUT CONTACT BETWEEN AN INVESTMENT
MANAGER AND AN ISSUER OR POTENTIAL ISSUER"). DURING SUCH MEETINGS, THE
FIRM'S EMPLOYEES MAY ON NO ACCOUNT BE IN RECEIPT OF INSIDE INFORMATION
(AS DESCRIBED IN ARTICLE 7 OF THE MARKET ABUSE REGULATION (EU) NO 596/2014).
(https://www.handbook.fca.org.uk/handbook/glossary/G3532m.html)
COMPANIES WHO DISCLOSE INSIDE INFORMATION ARE IN BREACH OF REGULATION
AND MUST IMMEDIATELY AND CLEARLY NOTIFY ALL ATTENDEES. FOR INFORMATION
ON THE FIRM'S POLICY IN RELATION TO ITS PARTICIPATION IN MARKET SOUNDINGS,
PLEASE SEE https://www.horizon-asset.co.uk/market-soundings/.

HORIZON ASSET LLP IS AUTHORISED AND REGULATED
BY THE FINANCIAL CONDUCT AUTHORITY.

Re: Fuskei2 configuration, TDB2 data, Inferencing with ontologies, Persisting named graphs upon server restart

Reply via email to