Hi!

  I'm looking into ways to run a part of my pipeline multi-threaded:

                .-> Multip0 -> A1 -> Multip1 -> A2 ->.
  reader -> A0 <                                      > CASmerger
                `-> Multip2 -> A3 ------------> A2 ->'
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                ParallelStep is generated for each branch
                in a custom flow controller

Basically, I need a way to tell UIMA to run each ParallelStep (which
normally just denotes the CAS flow) truly in parallel.  I have two
constraints:

  (i) I'm using UIMAfit heavily, and multiple CAS multipliers and
mergers (even within the parallel branches).  So I can't use CPE.

  (ii) I need multi-threading, not separate processes.  (I have just
a meager 24G RAM (sigh) and one Java process with all the linguistic
models and stuff loaded takes 3GB RAM.  So I really need to load these
resources to memory only once.)


  I looked into UIMA-AS, including Richard's helpful DKpro-lab code
sample, but I can't figure out how to make it reasonably work with
a *complex* UIMAfit pipeline that spans many branches and many
analysis engines - it seems to me that I would need some centralized
places where to specify it, and basically completely rewrite my pipeline
building code (to the worse, in my impression).

  ...and I'm not even sure, from reading UIMA-AS code, if I could make
it run in multiple threads within a single process!  From comments in

        
org/apache/uima/aae/controller/AggregateAnalysisEngineController_impl.java:parallelStep()

I'm getting an impression that non-remote AEs will be executed serially
after all, not in parallel.  Is that correct?


  So going back to the original UIMA code, it seems to me that the thing
to do would be replacing ASB_impl with my own copy (inheritance would
not cut it the way it's coded), AggregateAnalysisEngine_impl with my own
specialization or copy (as ASB_impl usage is hardcoded there) and
rewrite the while() loop in ParallelStep case of ASB's
processUntilNextOutputCas() to run in parallel.  And hope I didn't miss
any catch...


  Is there an option I'm missing?  Any hints would be really
appreciated!

  Thanks,

                                Petr Baudis

Reply via email to