Thilo Goetz wrote:
Marshall Schor (JIRA) wrote:
Space/Time tradeoffs in the CAS
-------------------------------

                 Key: UIMA-1089
                 URL: https://issues.apache.org/jira/browse/UIMA-1089
             Project: UIMA
          Issue Type: Improvement
          Components: Core Java Framework
    Affects Versions: 2.2.2
            Reporter: Marshall Schor
            Priority: Minor


Investigate / implement optimizations that trade user-controllable time (running the optimizations) for space. One such optimization could be: sharing strings. To do the sharing requires additional computation and (temporary) storage to detect the sharing opportunities, but results in space savings. For instance, a common annotation might assign short strings like "noun" to a "part-of-speech" feature. If you are processing a large document, there may be a large number of these kinds of string valued features, picked from a small pool of allowable values. The CAS's string storage might be able to be optimized to share the string references in this case, at a cost of temporarily creating a hash table of the unique strings and using it to identify sharing possibilities. A new API call to do this optimization would isolate the performance/space overhead of doing this optimization to just those users and times where it makes sense to do this.

An alternative would be to automatically figure this out for some selected kinds of optimizations, but I'm not sure that could be done without impacting finely-tuned systems negatively.


Marshall,

I'm not sure what you're doing here.  Why don't you just
start discussion threads on the mailing list?  Why do these
things need to be in Jira?
I thought the reason to put these in Jira was to "track" them so they don't get lost. It seemed like a good idea to me. The discussion can take place as Jira comments, and later can be easily located. I don't have a strong preference, though. -Marshall

Reply via email to