Hi Andy,

By way of introduction I've been exploring ontology solutions
with Brandon recently using Jena and Fuseki and come to
appreciate your capable stewardship and responsive
engagement with this community. Thank you.

I was able to replicate Brandon's problem loading the ICD-10
dataset using any of the built-in OWL reasoners without search
indexing. However it did successfully load and respond fast to
queries using RDFSRuleReasoner, as well as Transitive and Generic.

Brandon is better able to say whether we need OWL for other
reasons, but we do want to use ICD-10-CM with data for inference.
Would* Data with RDFS Inferencing* have advantages over using the
built-in RDFSRuleReasoner for that?

Thanks again for any help in advance,

Ryan

*JFYI, the Transitive- and RDFSRuleReasoners inferred*

*570k :subClassOf and an additional 192k :type triples over the base 96k of
each relation, respectively.*


*Profiling the OWL reasoner with VisualVM I was able to see that it seems
to cycle without end through*


*Generator.pump() -> LPInterpreter.next() -> LPInterpreter.run() ->
Node.sameValueAs(). I have yet to try this on a reduced dataset to see if I
can find the minimum necessary to replicate the spin.*

On Fri, Sep 17, 2021 at 7:04 AM Andy Seaborne <a...@apache.org> wrote:

> Hi Brandon,
>
> The configuration is quite complex - it's likely due to the inference
> layer but it would be worth trying without the text index to confirm
> that especially for the loading.
>
> Do you need all that
> <http://jena.hpl.hp.com/2003/OWLMicroFBRuleReasoner>
> offers or is all you want RDFS subclass?
>
> Because there is
>    https://jena.apache.org/documentation/rdfs/
> (give ICD10CM as both data and also in a file to be the schema).
>
> The schema is assumed to be fixed which might not work for you long term
> but it is another data point to understand the situation.
>
> About ICD10CM itseld - are you wanting to navigate its structure or use
> it with data for inference? If it is to navigate its structure do you
> even want inference?
>
>      Andy
>
> On 14/09/2021 00:42, Brandon Sara wrote:
> > I have been able to create an easily reproducible scenario that others
> can use to replicate and test the issues that I’m seeing:
> >
> > 1. Start fuseki using the config that I’ve listed below.
> > 2. Attempt to load the latest version of ICD-10 CM as provided freely by
> BioPortal: https://bioportal.bioontology.org/ontologies/ICD10CM
> >
> > If inference is enabled, then I can’t even get the turtle file to load
> in its entirety. If I load the turtle file without inference, then the load
> completes, but upon restarting the server and submitting a request, the
> service doesn’t finish processing the request in any reasonable amount of
> time, no matter how simple the query of the request is (one that actually
> queries data from the dataset at least).
> >
> > Config:
> >
> > PREFIX dcterms: <http://purl.org/dc/terms/>
> > PREFIX fuseki: <http://jena.apache.org/fuseki#>
> > PREFIX ja: <http://jena.hpl.hp.com/2005/11/Assembler#>
> > PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
> > PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
> > PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
> > PREFIX tdb2: <http://jena.apache.org/2016/tdb#>
> > PREFIX text: <http://jena.apache.org/text#>
> >
> > [] rdf:type fuseki:Server ;
> >    fuseki:pingEP true ;
> >    fuseki:statsEP true ;
> >    fuseki:metricsEP true ;
> >    fuseki:compactEP true ;
> >
> >    ja:context [
> >      ja:cxtName "arq:queryTimeout" ;
> >      ja:cxtValue "10000,60000" ;
> >    ] ;
> > .
> >
> > <#kgService> a fuseki:Service ;
> >    fuseki:name "kg" ;
> >    fuseki:dataset <#kgIndexedDataset> ;
> >    fuseki:endpoint [ fuseki:operation fuseki:query; ] ;
> >    fuseki:endpoint [ fuseki:operation fuseki:update; ] ;
> >    fuseki:endpoint [ fuseki:operation fuseki:gsp_r; ] ;
> >    fuseki:endpoint [ fuseki:operation fuseki:gsp_rw; fuseki:name "data";
> ] ;
> > .
> >
> > <#kgIndexedDataset> rdf:type text:TextDataset ;
> >    text:dataset <#kgInferredDataset> ;
> >    text:index <#kgIndex> ;
> > .
> >
> > <#kgIndex> a text:TextIndexLucene ;
> >    text:directory <file:/fuseki/databases/kg.index> ;
> >    text:entityMap <#kgEntityMap> ;
> >    text:storeValues true ;
> >    text:queryParser [ a text:ComplexPhraseQueryParser ]
> > .
> >
> > <#kgEntityMap> a text:EntityMap ;
> >    text:defaultField "label" ;
> >    text:entityField "uri" ;
> >    text:uidField "uid" ;
> >    text:langField "lang" ;
> >    text:graphField "graph" ;
> >    text:map (
> >      [ text:field "id" ;
> >        text:predicate dcterms:identifier ]
> >
> >      [ text:field "label" ;
> >        text:predicate rdfs:label ]
> >    ) ;
> > .
> >
> > <#kgInferredDataset> a ja:RDFDataset ;
> >    ja:defaultGraph <#kgInferenceModel> ;
> > .
> >
> > <#kgInferenceModel> a ja:InfModel ;
> >    ja:baseModel <#kgTdbGraph> ;
> >    ja:reasoner [
> >      ja:reasonerURL <http://jena.hpl.hp.com/2003/OWLMicroFBRuleReasoner>
> >    ] ;
> > .
> >
> > <#kgTdbGraph> a tdb2:GraphTDB2 ;
> >    tdb2:dataset <#kgTdbDataset> ;
> > .
> >
> > <#kgTdbDataset> a tdb2:DatasetTDB2 ;
> >    tdb2:location "/fuseki/databases/kg" ;
> > .
> >
> >
> >
> > No PHI in Email: PointClickCare and Collective Medical, A PointClickCare
> Company, policies prohibit sending protected health information (PHI) by
> email, which may violate regulatory requirements. If sending PHI is
> necessary, please contact the sender for secure delivery instructions.
> >
> > Confidentiality Notice: This email message, including any attachments,
> is for the sole use of the intended recipient(s) and may contain
> confidential and privileged information. Any unauthorized review, use,
> disclosure or distribution is prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> >
>

Reply via email to