Re: Subclass caching has some problems on Fuseki startup

Lorenz Buehmann Mon, 30 Aug 2021 00:30:36 -0700


On 27.08.21 22:09, Brandon Sara wrote:

I’ve finally tracked down the problem (at least at a high level). When using 
the Transitive Reasoner, there is a block of code which caches all sub class 
triples 
(https://github.com/apache/jena/blob/main/jena-core/src/main/java/org/apache/jena/reasoner/transitiveReasoner/TransitiveEngine.java#L316-L326).
 Part of this code searches for all sub properties of `subClassOf` and begins 
caching triples for those sub-properties. In my situation, I’ve added 
`owl:equivalentClass` manually (since only TransitiveReasoner` is being used) 
and manually made it a sub property of `subClassOf`.

in that case you're losing inferences from one direction, don't you?Wouldn't it be more "clean" if you resolve the owl:equivalentClassaxioms beforehand by creating subClassOf axioms for both directions?Either by means of a SPARQL query or by wrapping a GenericRuleReasonerwith two rules?

The data that I’m uploading right now has a lot of equivalent class triples 
(~>300k). It seems, if I’m understanding the code correctly as I’ve been 
debugging it, that not only is the triple cached…but a traversal of many other 
triples occurs when the caching occurs for even a single triple, is that correct? 
This would explain why (1) it never seems to finish what it is doing and (2) the 
memory grows very, very large while doing it. I ran a single query last night and 
after more than 6 hours, 8 CPUs, and 20GB of RAM, it still never finished loading 
the cache. It seems as though that the runtime of this could be exponential in 
nature. My dataset is well over 20 million records (maybe even more, I still 
haven’t gotten a full count yet, but I know for a fact that it is well over 10 
million and believe it to be well more than 20 million). Like I’ve mentioned 
before, there are basically no individuals in the dataset, it’s all ontology 
because it is health care industry coding systems and classifications.

Another strange thing, which I’ve mentioned before, is that I don’t have any of 
these issues when I initially load the data, I can load everything with just 4 
GB of RAM, it loads in a reasonable amount of time, and I can submit queries of 
pretty much any complexity after the upload is complete with no issues, and 
they are very fast too. This only occurs when the server has been restarted and 
the first query that actually pulls something from the dataset (I.E. not an 
empty query) is submitted (no matter how simple or complex that query may be).

Is this a bug or should `owl:equivalent` class work without my own manual 
specification of it?

No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.

Re: Subclass caching has some problems on Fuseki startup

Reply via email to