What about https://issues.apache.org/jira/browse/JENA-650?
On Tue, Jun 21, 2016 at 10:59 AM, Andy Seaborne <a...@apache.org> wrote: > We have outstanding: > > https://github.com/apache/jena/pull/47 > > which changes the cache to LRU from fixed. > That does not fix any memory leaks but might mitigate them. > > There are two FIXME in the PR which could do with looking at. > > Andy > > > On 21/06/16 09:28, Dave Reynolds wrote: >> >> Hi Martynas, >> >> On 20/06/16 22:18, Martynas Jusevičius wrote: >>> >>> Hey, >>> >>> after using GenericRuleReasoner and InfModel more extensively, we >>> started experiencing memory leaks that eventually kill our webapp >>> because it runs out of heap space. Jena version is 2.11.0. >>> >>> After some profiling, it seems that RETEEngine.clauseIndex and/or >>> RETEEngine.infGraph are retaining a lot of references. It might be >>> related to this report, but I'm not sure: >>> >>> https://mail-archives.apache.org/mod_mbox/jena-users/201403.mbox/%3c5319b4e0.4060...@gmail.com%3E >>> >> >> If it is related to that then it is not a leak it is "just" memory use. >> >> A leak implies that when you turn over data then unused internal state >> objects are not reclaimed. Are you continuously adding and deleting >> data? If so then the delete should release the whole of the RETEEngine >> state and start over. If that isn't happening then that's a bug but you >> could work around with an explicit reset() or even delete and recreate >> your InfGraph at that stage. A delete loses all the state anyway. >> >>> The suggestion was to use use backward rules instead of forward rules. >>> I have read the following: >>> https://jena.apache.org/documentation/inference/#rules >>> >>> But still I fail to understand in which situations backward rules >>> can/should be used instead of forward rules? >> >> >> Forward rules are generally faster because they keep all that partially >> matched state. So if you have stable data or just add triples >> monotonically, and have a lot of queries, then generally use forward >> rules for performance. >> >> Backward rules (without tabling) keep no state so there's less memory >> overhead and no cost for delete but they are slow and have to redo the >> work for every query. >> >> Strictly the performance trade-off is a bit more subtle than that. >> Forward rules will try to work out all the entailments whereas backward >> rules are just responding to specific queries. So if your queries only >> touch a small part of the possible space then backward rules could be >> more efficient. However in practice RDF rules seem involve a lot of >> unground terms and lots of rules match nearly every query. >> >> Tabling allows you to selectively cache certain predicates which can >> enable you to get more reasonable performance while keeping memory use >> under control. You can also do some tuning of how the rules execute by >> testing if variables are bound or not and using different clause >> orderings for different query patterns. >> >>> I guess simply replacing >>> -> with <- will not be enough? >> >> >> Unless you use non-monotonic predicates (which, sadly, you do) then that >> would be enough to get something working. In fact you don't even need to >> do that. If you create a pure backward reasoner instances (as opposed to >> the hybrid) reasoner it'll read forward syntax rules but treat them as >> backward. >> >>> The actual rules in question look like >>> this: >>> >>> [gp: (?class rdf:type >>> <http://www.w3.org/2000/01/rdf-schema#Class>), (?class ?p ?o), (?p >>> rdf:type owl:AnnotationProperty), (?p rdfs:isDefinedBy >>> <http://graphity.org/gp#>), (?subClass rdfs:subClassOf ?class), >>> (?subClass rdf:type <http://www.w3.org/2000/01/rdf-schema#Class>), >>> noValue(?subClass ?p) -> (?subClass ?p ?o) ] >> >> >> That's a horrible rule from the engine's point of view. The head is >> completely ungrounded so when running backwards then it will need to run >> for *every* triple pattern. [It also makes no sense to me as a use of >> owl:AnnotationProperty but whatever.] You could try it backwards but put >> the clauses in a more efficient order: >> >> (?subClass ?p ?o) <- >> (?p rdf:type owl:AnnotationProperty), >> (?p rdfs:isDefinedBy <http://graphity.org/gp#>), >> (?subClass rdfs:subClassOf ?class), (?class ?p ?o) . >> >> The rdf:type rdfs:Class constraints are pointless since those are >> implied by rdfs:subClassOf anyway. The noValue check is probably best >> avoided for both cases. >> >> Alternatively, depending on the nature of your space leak you could use >> hybrid rules: >> >> (?p rdf:type owl:AnnotationProperty), >> (?p rdfs:isDefinedBy <http://graphity.org/gp#>) >> -> >> [ (?subClass ?p ?o) <- (?subClass rdfs:subClassOf ?class), >> (?class ?p ?o) ] >> >> That way the forward engine is only looking at your annotations and the >> backward engine then has rules that have grounded predicates. You could >> also table those predicates: >> >> (?p rdf:type owl:AnnotationProperty), >> (?p rdfs:isDefinedBy <http://graphity.org/gp#>) >> -> >> table(?p), >> [ (?subClass ?p ?o) <- (?subClass rdfs:subClassOf ?class), >> (?class ?p ?o) ] >> >>> [gcdm: (?template rdf:type <http://graphity.org/gp#Template>), >>> (?template <http://graphity.org/gc#defaultMode> ?o), (?subClass >>> rdfs:subClassOf ?template), (?subClass rdf:type >>> <http://graphity.org/gp#Template>), noValue(?subClass >>> <http://graphity.org/gc#defaultMode>) -> (?subClass >>> <http://graphity.org/gc#defaultMode> ?o) ] >>> [gcsm: (?template rdf:type <http://graphity.org/gp#Template>), >>> (?template <http://graphity.org/gc#supportedMode> ?supportedMode), >>> (?subClass rdfs:subClassOf ?template), (?subClass rdf:type >>> <http://graphity.org/gp#Template>) -> (?subClass >>> <http://graphity.org/gc#supportedMode> ?supportedMode) ] >> >> >> These two are more reasonable and could be used backwards or hybrid. >> >>> [rdfs9: (?x rdfs:subClassOf ?y), (?a rdf:type ?x) -> (?a rdf:type ?y)] >> >> >> That would work backwards. Depending on the scale of your data you might >> want to table rdf:type for performance/space tradeoff. >> >>> Can these be rewritten as backward rules instead? >> >> >> Sure, the challenge is performance tuning as noted above. >> >> > Does it involve code changes, such as calling reset() etc? >> >> Shouldn't do. >> >> Dave > >