Hello, Is there any best practice documentation out there for running UIMA/UIMA-AS on a cluster? I have only run single machine instances of UIMA (mostly through Eclipse) and have not investigated the ability to perform multiple simultaneous analyses in order to process large document collections.
It's not clear to me how UIMA would operate in a cluster environment, do people really do message passing using JMI? I'm guessing this is the case as I seeing references to MPICH, SGE or other things I am more used to. I've looked through some of the documentation (including all the Overview & SDK setup) but am not finding anything helpful. I've also tried googling but I am not getting much except this: http://comments.gmane.org/gmane.comp.apache.uima.general/2131 which makes me think it is possible. Currently with my level of confusion I think it may be best to have multiple instances of UIMA on a cluster and just submit jobs processing discrete document sets to our SGE cluster and ignore whatever scaling features are actually present in UIMA since the document processing I plan to do is data parallel. -John