Thilo Goetz (JIRA) wrote:

Some applications may break if they require == between instances of the same JCas object. Other of course won't care. So - it's good for this to be configurable.

It might be good, also, to put in "soft references" for this - which will be reclaimed if memory gets low. But this might end up doubling the size of the storage used for this (to hold the soft reference)...

-Marshall
Use of the JCas cache should be configurable
--------------------------------------------

                 Key: UIMA-1068
                 URL: https://issues.apache.org/jira/browse/UIMA-1068
             Project: UIMA
          Issue Type: Improvement
          Components: Core Java Framework
    Affects Versions: 2.2.2
            Reporter: Thilo Goetz
            Assignee: Thilo Goetz
             Fix For: 2.3


The JCas caches all CAS objects that are accessed through it.  This means that 
JCas objects that are no longer used can't be garbage collected.  If only part 
of the processing chain uses the JCas, or the caching is redundant for some 
other reason, this produces a severe memory overhead.

I ran the same experiment I ran for UIMA-1067: doubled the size of Moby Dick 
and ran the POS tagger from the sandbox.  I used the improved version from 
UIMA-1067 as base case and simply commented out the line that adds JCas objects 
to the cache.  This reduced the required heap size from 115MB to 105MB.  It 
also improved the performance from around 10s for the base case to consistently 
under 9s for the version without any caching.  I looked at the tagger source 
code, and saw that it keeps its own list of tokens around.  So the savings are 
just the caching data structure.

There may be cases where the JCas cache is a performance win, though I'd be 
curious to see the benchmarks.  So we should not just turn it off, but make it 
configurable.



Reply via email to