I’ve finally tracked down the problem (at least at a high level). When using the Transitive Reasoner, there is a block of code which caches all sub class triples (https://github.com/apache/jena/blob/main/jena-core/src/main/java/org/apache/jena/reasoner/transitiveReasoner/TransitiveEngine.java#L316-L326). Part of this code searches for all sub properties of `subClassOf` and begins caching triples for those sub-properties. In my situation, I’ve added `owl:equivalentClass` manually (since only TransitiveReasoner` is being used) and manually made it a sub property of `subClassOf`. The data that I’m uploading right now has a lot of equivalent class triples (~>300k). It seems, if I’m understanding the code correctly as I’ve been debugging it, that not only is the triple cached…but a traversal of many other triples occurs when the caching occurs for even a single triple, is that correct? This would explain why (1) it never seems to finish what it is doing and (2) the memory grows very, very large while doing it. I ran a single query last night and after more than 6 hours, 8 CPUs, and 20GB of RAM, it still never finished loading the cache. It seems as though that the runtime of this could be exponential in nature. My dataset is well over 20 million records (maybe even more, I still haven’t gotten a full count yet, but I know for a fact that it is well over 10 million and believe it to be well more than 20 million). Like I’ve mentioned before, there are basically no individuals in the dataset, it’s all ontology because it is health care industry coding systems and classifications.
Another strange thing, which I’ve mentioned before, is that I don’t have any of these issues when I initially load the data, I can load everything with just 4 GB of RAM, it loads in a reasonable amount of time, and I can submit queries of pretty much any complexity after the upload is complete with no issues, and they are very fast too. This only occurs when the server has been restarted and the first query that actually pulls something from the dataset (I.E. not an empty query) is submitted (no matter how simple or complex that query may be). Is this a bug or should `owl:equivalent` class work without my own manual specification of it? No PHI in Email: PointClickCare and Collective Medical, A PointClickCare Company, policies prohibit sending protected health information (PHI) by email, which may violate regulatory requirements. If sending PHI is necessary, please contact the sender for secure delivery instructions. Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
