Hi Ryan,
On 17/09/2021 16:22, Ryan Stokes wrote:
Hi Andy,
By way of introduction I've been exploring ontology solutions
with Brandon recently using Jena and Fuseki and come to
appreciate your capable stewardship and responsive
engagement with this community. Thank you.
I was able to replicate Brandon's problem loading the ICD-10
dataset using any of the built-in OWL reasoners without search
indexing. However it did successfully load and respond fast to
queries using RDFSRuleReasoner, as well as Transitive and Generic.
OK - we're getting closer.
That "pump" loop could well be cause if it is from a rule with with (?x
?p ?y) in it. Rule 'rdf1and4' - I think the default reasoner for RDFS
omits that rule. This dataset is only 800K triples.
The rules engine copes with the schema and data changing during runtime
with an engine that minimises re-computation at the expense of a lot
more initial work and crucially tracking with in-memory state. I guess
it is on first-touch doing all the setup work.
[Later: It is not specific to TDB - seems to happen with any base
storage including both in-memory kinds.]
Brandon is better able to say whether we need OWL for other
reasons, but we do want to use ICD-10-CM with data for inference.
Would* Data with RDFS Inferencing* have advantages over using the
built-in RDFSRuleReasoner for that?
Maybe :-)
Data+RDFS is different - it's not trying to be a replacement for the
rules engine for RDFS. We have the rules engine for complete adherence
to RDFS.
Data+RDFS:
1/ It is a fixed RDFS (subclass/subproperty/domain/range).
No axioms. No x:directSubClassOf.
2/ Applies to every graph in the dataset.
3/ Assumes the schema is fixed - no update to the schema at runtime.
4/ The schema is invisible - the app sees data and inferred triples.
but it should scale and work with persistent databases.
[ The "no update to the schema" could be changed. Programming needed
though. ]
So - Ryan, Brandon - what inference does your usage need? Is the
schema/ontology updated during runtime?
Andy
Thanks again for any help in advance,
Ryan
*JFYI, the Transitive- and RDFSRuleReasoners inferred*
*570k :subClassOf and an additional 192k :type triples over the base 96k of
each relation, respectively.*
*Profiling the OWL reasoner with VisualVM I was able to see that it seems
to cycle without end through*
*Generator.pump() -> LPInterpreter.next() -> LPInterpreter.run() ->
Node.sameValueAs(). I have yet to try this on a reduced dataset to see if I
can find the minimum necessary to replicate the spin.*
On Fri, Sep 17, 2021 at 7:04 AM Andy Seaborne <a...@apache.org> wrote:
Hi Brandon,
The configuration is quite complex - it's likely due to the inference
layer but it would be worth trying without the text index to confirm
that especially for the loading.
Do you need all that
<http://jena.hpl.hp.com/2003/OWLMicroFBRuleReasoner>
offers or is all you want RDFS subclass?
Because there is
https://jena.apache.org/documentation/rdfs/
(give ICD10CM as both data and also in a file to be the schema).
The schema is assumed to be fixed which might not work for you long term
but it is another data point to understand the situation.
About ICD10CM itseld - are you wanting to navigate its structure or use
it with data for inference? If it is to navigate its structure do you
even want inference?
Andy
On 14/09/2021 00:42, Brandon Sara wrote:
I have been able to create an easily reproducible scenario that others
can use to replicate and test the issues that I’m seeing:
1. Start fuseki using the config that I’ve listed below.
2. Attempt to load the latest version of ICD-10 CM as provided freely by
BioPortal: https://bioportal.bioontology.org/ontologies/ICD10CM
If inference is enabled, then I can’t even get the turtle file to load
in its entirety. If I load the turtle file without inference, then the load
completes, but upon restarting the server and submitting a request, the
service doesn’t finish processing the request in any reasonable amount of
time, no matter how simple the query of the request is (one that actually
queries data from the dataset at least).
Config:
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX fuseki: <http://jena.apache.org/fuseki#>
PREFIX ja: <http://jena.hpl.hp.com/2005/11/Assembler#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX tdb2: <http://jena.apache.org/2016/tdb#>
PREFIX text: <http://jena.apache.org/text#>
[] rdf:type fuseki:Server ;
fuseki:pingEP true ;
fuseki:statsEP true ;
fuseki:metricsEP true ;
fuseki:compactEP true ;
ja:context [
ja:cxtName "arq:queryTimeout" ;
ja:cxtValue "10000,60000" ;
] ;
.
<#kgService> a fuseki:Service ;
fuseki:name "kg" ;
fuseki:dataset <#kgIndexedDataset> ;
fuseki:endpoint [ fuseki:operation fuseki:query; ] ;
fuseki:endpoint [ fuseki:operation fuseki:update; ] ;
fuseki:endpoint [ fuseki:operation fuseki:gsp_r; ] ;
fuseki:endpoint [ fuseki:operation fuseki:gsp_rw; fuseki:name "data";
] ;
.
<#kgIndexedDataset> rdf:type text:TextDataset ;
text:dataset <#kgInferredDataset> ;
text:index <#kgIndex> ;
.
<#kgIndex> a text:TextIndexLucene ;
text:directory <file:/fuseki/databases/kg.index> ;
text:entityMap <#kgEntityMap> ;
text:storeValues true ;
text:queryParser [ a text:ComplexPhraseQueryParser ]
.
<#kgEntityMap> a text:EntityMap ;
text:defaultField "label" ;
text:entityField "uri" ;
text:uidField "uid" ;
text:langField "lang" ;
text:graphField "graph" ;
text:map (
[ text:field "id" ;
text:predicate dcterms:identifier ]
[ text:field "label" ;
text:predicate rdfs:label ]
) ;
.
<#kgInferredDataset> a ja:RDFDataset ;
ja:defaultGraph <#kgInferenceModel> ;
.
<#kgInferenceModel> a ja:InfModel ;
ja:baseModel <#kgTdbGraph> ;
ja:reasoner [
ja:reasonerURL <http://jena.hpl.hp.com/2003/OWLMicroFBRuleReasoner>
] ;
.
<#kgTdbGraph> a tdb2:GraphTDB2 ;
tdb2:dataset <#kgTdbDataset> ;
.
<#kgTdbDataset> a tdb2:DatasetTDB2 ;
tdb2:location "/fuseki/databases/kg" ;
.
No PHI in Email: PointClickCare and Collective Medical, A PointClickCare
Company, policies prohibit sending protected health information (PHI) by
email, which may violate regulatory requirements. If sending PHI is
necessary, please contact the sender for secure delivery instructions.
Confidentiality Notice: This email message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
confidential and privileged information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply email and destroy all copies
of the original message.