Re: [jira] Created: (UIMA-1068) Use of the JCas cache should be configurable

Marshall Schor Wed, 11 Jun 2008 20:10:19 -0700

Thilo Goetz wrote:

Marshall Schor wrote:
Thilo Goetz (JIRA) wrote:
Some applications may break if they require == between instances ofthe same JCas object. Other of course won't care. So - it's goodfor this to be configurable.
Any annotator that works with this assumption is broken IMO.
Why would anybody make such an assumption?

One use case: With JCas it is possible to add fields to the cover class(thus, you could add a hashmap object, for instance); this is describedin the documentation for JCas. Those field values are only preservedfor different iterations if the JCas instance is kept.

-Marshall

I don't see anything
in our documentation that encourages this.  To the contrary,
we say that we don't guarantee object identity for feature
structures, and that equals() should be used to compare them.
It might be good, also, to put in "soft references" for this - whichwill be reclaimed if memory gets low. But this might end up doublingthe size of the storage used for this (to hold the soft reference)...
-Marshall
Use of the JCas cache should be configurable
--------------------------------------------

                 Key: UIMA-1068
                 URL: https://issues.apache.org/jira/browse/UIMA-1068
             Project: UIMA
          Issue Type: Improvement
          Components: Core Java Framework
    Affects Versions: 2.2.2
            Reporter: Thilo Goetz
            Assignee: Thilo Goetz
             Fix For: 2.3
The JCas caches all CAS objects that are accessed through it. Thismeans that JCas objects that are no longer used can't be garbagecollected. If only part of the processing chain uses the JCas, orthe caching is redundant for some other reason, this produces asevere memory overhead.
I ran the same experiment I ran for UIMA-1067: doubled the size ofMoby Dick and ran the POS tagger from the sandbox. I used theimproved version from UIMA-1067 as base case and simply commentedout the line that adds JCas objects to the cache. This reduced therequired heap size from 115MB to 105MB. It also improved theperformance from around 10s for the base case to consistently under9s for the version without any caching. I looked at the taggersource code, and saw that it keeps its own list of tokens around.So the savings are just the caching data structure.
There may be cases where the JCas cache is a performance win, thoughI'd be curious to see the benchmarks. So we should not just turn itoff, but make it configurable.

Re: [jira] Created: (UIMA-1068) Use of the JCas cache should be configurable

Reply via email to