Another way to reduce the footprint of UIMA:

One user reported the basic UIMA framework as taking approx. 5 MB (not
sure exactly what was measured).  I investigated to see if UIMA might be
loading more classes than needed.  I found that at startup time, UIMA
reads a factory configuration file and assigns classes to interfaces,
storing these in a hashmap. 

The factory configuration (located in
uimaj-core/src/main/resources/org.apache.uima.impl/factoryConfig.xml)
has specs for things like the collection processing manager. 

The startup code does a Class.forName on these to load them (and confirm
they are present).   This makes Java "lazy loading" not work so well,
since many of these won't be used.  I did a heapdump of a tiny UIMA
application using the
uimaj-examples/src/main/java/org.apache.uima.examples/ExampleApplication.java
- reading a simple descriptor and running it, and found many classes
pertaining to the CPE (Collection Processing) which my test application
doesn't use. 

I see two possible approaches to improving this: one is having users who
are memory sensitive learn more about the factory configuration file,
and have them remove parts of it that are for things they won't be
using.  I don't much like this approach - it's error prone, especially
over time...

The other approach is to modify the way the factory configuration does
it resolution to make it lazy - for instance, changing it so that only
on first reference to an interface would the corresponding class be
loaded.  This has a potential issue where the failure to find a
particular needed implementation in the class path might happen later in
a run, rather than at the start, but I don't think that's a serious
drawback, compared to the potential footprint reduction.

What do others think?

-Marshall

Reply via email to