On 27.08.21 22:09, Brandon Sara wrote:
I’ve finally tracked down the problem (at least at a high level). When using
the Transitive Reasoner, there is a block of code which caches all sub class
triples
(https://github.com/apache/jena/blob/main/jena-core/src/main/java/org/apache/jena/reasoner/transitiveReasoner/TransitiveEngine.java#L316-L326).
Part of this code searches for all sub properties of `subClassOf` and begins
caching triples for those sub-properties. In my situation, I’ve added
`owl:equivalentClass` manually (since only TransitiveReasoner` is being used)
and manually made it a sub property of `subClassOf`.
in that case you're losing inferences from one direction, don't you?
Wouldn't it be more "clean" if you resolve the owl:equivalentClass
axioms beforehand by creating subClassOf axioms for both directions?
Either by means of a SPARQL query or by wrapping a GenericRuleReasoner
with two rules?
The data that I’m uploading right now has a lot of equivalent class triples
(~>300k). It seems, if I’m understanding the code correctly as I’ve been
debugging it, that not only is the triple cached…but a traversal of many other
triples occurs when the caching occurs for even a single triple, is that correct?
This would explain why (1) it never seems to finish what it is doing and (2) the
memory grows very, very large while doing it. I ran a single query last night and
after more than 6 hours, 8 CPUs, and 20GB of RAM, it still never finished loading
the cache. It seems as though that the runtime of this could be exponential in
nature. My dataset is well over 20 million records (maybe even more, I still
haven’t gotten a full count yet, but I know for a fact that it is well over 10
million and believe it to be well more than 20 million). Like I’ve mentioned
before, there are basically no individuals in the dataset, it’s all ontology
because it is health care industry coding systems and classifications.
Another strange thing, which I’ve mentioned before, is that I don’t have any of
these issues when I initially load the data, I can load everything with just 4
GB of RAM, it loads in a reasonable amount of time, and I can submit queries of
pretty much any complexity after the upload is complete with no issues, and
they are very fast too. This only occurs when the server has been restarted and
the first query that actually pulls something from the dataset (I.E. not an
empty query) is submitted (no matter how simple or complex that query may be).
Is this a bug or should `owl:equivalent` class work without my own manual
specification of it?
No PHI in Email: PointClickCare and Collective Medical, A PointClickCare
Company, policies prohibit sending protected health information (PHI) by email,
which may violate regulatory requirements. If sending PHI is necessary, please
contact the sender for secure delivery instructions.
Confidentiality Notice: This email message, including any attachments, is for
the sole use of the intended recipient(s) and may contain confidential and
privileged information. Any unauthorized review, use, disclosure or
distribution is prohibited. If you are not the intended recipient, please
contact the sender by reply email and destroy all copies of the original
message.