Re: Scale out using multiple Collection Readers and Cas Consumers

Burn Lewis Wed, 01 Dec 2010 16:07:01 -0800

The choice between single or multiple collection readers depends a lot on
the application.  If populating the initial input CASes is not expensive it
could be implemented as a UIMA-AS client similar to runRemoteAsyncAE in
figure 3, with the load balancing provided by the multiple service instances
consuming CASes from the input queue.  If the application has multiple
services each handling a stream of job requests then each could be a UIMA-AS
client and send CASes to the same input queue.


Note that CasMultipliers are a flexible replacement for Collection Readers
since the collection definition can be provided dynamically in the input
CAS, rather than in a configuration file or via some side channel.  So
similar to figure 5 an application could scale out multiple aggregates on a
cluster of machines, each aggregate starting with a CasMultiplier that gets
its collection definition (a directory or list of documents) from a CAS
placed on the shared input queue, and creates the document CASes to be
processed by the AEs in the rest of the aggregate.  Some of these AEs could
be scaled locally, or could be remote AS services which could be shared by
all of the scaled out aggregates.

In practice it may be sufficient to scale just the delegates inside an AS
aggregate, deploying multiple instances of any slow components and providing
a CAS pool large enough to keep all of the local and remote delegate
instances busy.

One advantage of designing an application as a deployed UIMA-AS aggregate
with some of its AEs deployed as remote services is that it is relatively
easy to start with a simple synchronous single-threaded UIMA aggregate and
later add the UIMA-AS deployments and scaleout.

~Burn

Re: Scale out using multiple Collection Readers and Cas Consumers

Reply via email to