I’ve finally tracked down the problem (at least at a high level). When using 
the Transitive Reasoner, there is a block of code which caches all sub class 
triples 
(https://github.com/apache/jena/blob/main/jena-core/src/main/java/org/apache/jena/reasoner/transitiveReasoner/TransitiveEngine.java#L316-L326).
 Part of this code searches for all sub properties of `subClassOf` and begins 
caching triples for those sub-properties. In my situation, I’ve added 
`owl:equivalentClass` manually (since only TransitiveReasoner` is being used) 
and manually made it a sub property of `subClassOf`. The data that I’m 
uploading right now has a lot of equivalent class triples (~>300k). It seems, 
if I’m understanding the code correctly as I’ve been debugging it, that not 
only is the triple cached…but a traversal of many other triples occurs when the 
caching occurs for even a single triple, is that correct? This would explain 
why (1) it never seems to finish what it is doing and (2) the memory grows 
very, very large while doing it. I ran a single query last night and after more 
than 6 hours, 8 CPUs, and 20GB of RAM, it still never finished loading the 
cache. It seems as though that the runtime of this could be exponential in 
nature. My dataset is well over 20 million records (maybe even more, I still 
haven’t gotten a full count yet, but I know for a fact that it is well over 10 
million and believe it to be well more than 20 million). Like I’ve mentioned 
before, there are basically no individuals in the dataset, it’s all ontology 
because it is health care industry coding systems and classifications.

Another strange thing, which I’ve mentioned before, is that I don’t have any of 
these issues when I initially load the data, I can load everything with just 4 
GB of RAM, it loads in a reasonable amount of time, and I can submit queries of 
pretty much any complexity after the upload is complete with no issues, and 
they are very fast too. This only occurs when the server has been restarted and 
the first query that actually pulls something from the dataset (I.E. not an 
empty query) is submitted (no matter how simple or complex that query may be).

Is this a bug or should `owl:equivalent` class work without my own manual 
specification of it?

No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.

Reply via email to