[ https://issues.apache.org/jira/browse/UIMA-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marshall Schor updated UIMA-1089: --------------------------------- Affects Version/s: 2.3 defer beyond 2.3.0 > Space/Time tradeoffs in the CAS > ------------------------------- > > Key: UIMA-1089 > URL: https://issues.apache.org/jira/browse/UIMA-1089 > Project: UIMA > Issue Type: Improvement > Components: Core Java Framework > Affects Versions: 2.2.2, 2.3 > Reporter: Marshall Schor > Priority: Minor > > Investigate / implement optimizations that trade user-controllable time > (running the optimizations) for space. One such optimization could be: > sharing strings. To do the sharing requires additional computation and > (temporary) storage to detect the sharing opportunities, but results in space > savings. For instance, a common annotation might assign short strings like > "noun" to a "part-of-speech" feature. If you are processing a large > document, there may be a large number of these kinds of string valued > features, picked from a small pool of allowable values. The CAS's string > storage might be able to be optimized to share the string references in this > case, at a cost of temporarily creating a hash table of the unique strings > and using it to identify sharing possibilities. A new API call to do this > optimization would isolate the performance/space overhead of doing this > optimization to just those users and times where it makes sense to do this. > An alternative would be to automatically figure this out for some selected > kinds of optimizations, but I'm not sure that could be done without impacting > finely-tuned systems negatively. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.