What about https://issues.apache.org/jira/browse/JENA-650?

On Tue, Jun 21, 2016 at 10:59 AM, Andy Seaborne <a...@apache.org> wrote:
> We have outstanding:
>
> https://github.com/apache/jena/pull/47
>
> which changes the cache to LRU from fixed.
> That does not fix any memory leaks but might mitigate them.
>
> There are two FIXME in the PR which could do with looking at.
>
>     Andy
>
>
> On 21/06/16 09:28, Dave Reynolds wrote:
>>
>> Hi Martynas,
>>
>> On 20/06/16 22:18, Martynas Jusevičius wrote:
>>>
>>> Hey,
>>>
>>> after using GenericRuleReasoner and InfModel more extensively, we
>>> started experiencing memory leaks that eventually kill our webapp
>>> because it runs out of heap space. Jena version is 2.11.0.
>>>
>>> After some profiling, it seems that RETEEngine.clauseIndex and/or
>>> RETEEngine.infGraph are retaining a lot of references. It might be
>>> related to this report, but I'm not sure:
>>>
>>> https://mail-archives.apache.org/mod_mbox/jena-users/201403.mbox/%3c5319b4e0.4060...@gmail.com%3E
>>>
>>
>> If it is related to that then it is not a leak it is "just" memory use.
>>
>> A leak implies that when you turn over data then unused internal state
>> objects are not reclaimed. Are you continuously adding and deleting
>> data? If so then the delete should release the whole of the RETEEngine
>> state and start over. If that isn't happening then that's a bug but you
>> could work around with an explicit reset() or even delete and recreate
>> your InfGraph at that stage. A delete loses all the state anyway.
>>
>>> The suggestion was to use use backward rules instead of forward rules.
>>> I have read the following:
>>> https://jena.apache.org/documentation/inference/#rules
>>>
>>> But still I fail to understand in which situations backward rules
>>> can/should be used instead of forward rules?
>>
>>
>> Forward rules are generally faster because they keep all that partially
>> matched state. So if you have stable data or just add triples
>> monotonically, and have a lot of queries, then generally use forward
>> rules for performance.
>>
>> Backward rules (without tabling) keep no state so there's less memory
>> overhead and no cost for delete but they are slow and have to redo the
>> work for every query.
>>
>> Strictly the performance trade-off is a bit more subtle than that.
>> Forward rules will try to work out all the entailments whereas backward
>> rules are just responding to specific queries. So if your queries only
>> touch a small part of the possible space then backward rules could be
>> more efficient. However in practice RDF rules seem involve a lot of
>> unground terms and lots of rules match nearly every query.
>>
>> Tabling allows you to selectively cache certain predicates which can
>> enable you to get more reasonable performance while keeping memory use
>> under control. You can also do some tuning of how the rules execute by
>> testing if variables are bound or not and using different clause
>> orderings for different query patterns.
>>
>>>  I guess simply replacing
>>> -> with <- will not be enough?
>>
>>
>> Unless you use non-monotonic predicates (which, sadly, you do) then that
>> would be enough to get something working. In fact you don't even need to
>> do that. If you create a pure backward reasoner instances (as opposed to
>> the hybrid) reasoner it'll read forward syntax rules but treat them as
>> backward.
>>
>>> The actual rules in question look like
>>> this:
>>>
>>> [gp:    (?class rdf:type
>>> <http://www.w3.org/2000/01/rdf-schema#Class>), (?class ?p ?o), (?p
>>> rdf:type owl:AnnotationProperty), (?p rdfs:isDefinedBy
>>> <http://graphity.org/gp#>), (?subClass rdfs:subClassOf ?class),
>>> (?subClass rdf:type <http://www.w3.org/2000/01/rdf-schema#Class>),
>>> noValue(?subClass ?p) -> (?subClass ?p ?o) ]
>>
>>
>> That's a horrible rule from the engine's point of view. The head is
>> completely ungrounded so when running backwards then it will need to run
>> for *every* triple pattern. [It also makes no sense to me as a use of
>> owl:AnnotationProperty but whatever.] You could try it backwards but put
>> the clauses in a more efficient order:
>>
>> (?subClass ?p ?o) <-
>>       (?p rdf:type owl:AnnotationProperty),
>>       (?p rdfs:isDefinedBy <http://graphity.org/gp#>),
>>       (?subClass rdfs:subClassOf ?class), (?class ?p ?o) .
>>
>> The rdf:type rdfs:Class constraints are pointless since those are
>> implied by rdfs:subClassOf anyway. The noValue check is probably best
>> avoided for both cases.
>>
>> Alternatively, depending on the nature of your space leak you could use
>> hybrid rules:
>>
>>    (?p rdf:type owl:AnnotationProperty),
>>    (?p rdfs:isDefinedBy <http://graphity.org/gp#>)
>>      ->
>>        [ (?subClass ?p ?o) <- (?subClass rdfs:subClassOf ?class),
>>                               (?class ?p ?o) ]
>>
>> That way the forward engine is only looking at your annotations and the
>> backward engine then has rules that have grounded predicates. You could
>> also table those predicates:
>>
>>    (?p rdf:type owl:AnnotationProperty),
>>    (?p rdfs:isDefinedBy <http://graphity.org/gp#>)
>>      ->
>>        table(?p),
>>        [ (?subClass ?p ?o) <- (?subClass rdfs:subClassOf ?class),
>>                                (?class ?p ?o) ]
>>
>>> [gcdm:  (?template rdf:type <http://graphity.org/gp#Template>),
>>> (?template <http://graphity.org/gc#defaultMode> ?o), (?subClass
>>> rdfs:subClassOf ?template), (?subClass rdf:type
>>> <http://graphity.org/gp#Template>), noValue(?subClass
>>> <http://graphity.org/gc#defaultMode>) -> (?subClass
>>> <http://graphity.org/gc#defaultMode> ?o) ]
>>> [gcsm:  (?template rdf:type <http://graphity.org/gp#Template>),
>>> (?template <http://graphity.org/gc#supportedMode> ?supportedMode),
>>> (?subClass rdfs:subClassOf ?template), (?subClass rdf:type
>>> <http://graphity.org/gp#Template>) -> (?subClass
>>> <http://graphity.org/gc#supportedMode> ?supportedMode) ]
>>
>>
>> These two are more reasonable and could be used backwards or hybrid.
>>
>>> [rdfs9: (?x rdfs:subClassOf ?y), (?a rdf:type ?x) -> (?a rdf:type ?y)]
>>
>>
>> That would work backwards. Depending on the scale of your data you might
>> want to table rdf:type for performance/space tradeoff.
>>
>>> Can these be rewritten as backward rules instead?
>>
>>
>> Sure, the challenge is performance tuning as noted above.
>>
>>  > Does it involve code changes, such as calling reset() etc?
>>
>> Shouldn't do.
>>
>> Dave
>
>

Reply via email to