Another way to reduce the footprint of UIMA: One user reported the basic UIMA framework as taking approx. 5 MB (not sure exactly what was measured). I investigated to see if UIMA might be loading more classes than needed. I found that at startup time, UIMA reads a factory configuration file and assigns classes to interfaces, storing these in a hashmap.
The factory configuration (located in uimaj-core/src/main/resources/org.apache.uima.impl/factoryConfig.xml) has specs for things like the collection processing manager. The startup code does a Class.forName on these to load them (and confirm they are present). This makes Java "lazy loading" not work so well, since many of these won't be used. I did a heapdump of a tiny UIMA application using the uimaj-examples/src/main/java/org.apache.uima.examples/ExampleApplication.java - reading a simple descriptor and running it, and found many classes pertaining to the CPE (Collection Processing) which my test application doesn't use. I see two possible approaches to improving this: one is having users who are memory sensitive learn more about the factory configuration file, and have them remove parts of it that are for things they won't be using. I don't much like this approach - it's error prone, especially over time... The other approach is to modify the way the factory configuration does it resolution to make it lazy - for instance, changing it so that only on first reference to an interface would the corresponding class be loaded. This has a potential issue where the failure to find a particular needed implementation in the class path might happen later in a run, rather than at the start, but I don't think that's a serious drawback, compared to the potential footprint reduction. What do others think? -Marshall