[ 
https://issues.apache.org/jira/browse/UIMA-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624031#action_12624031
 ] 

Marshall Schor commented on UIMA-1146:
--------------------------------------

The need for this became apparent when Adam found the next bottleneck in 
scaleout using UIMA-AS.  He found that the work of the UIMA Aggregate 
controller: running the flow controller code, serializing the CAS back to the 
sender (if remote) or out to remote delegates) could be a bottleneck.  The 
framework already supports multiple threads for this work, but only one is 
being used.

Here is a summary of discussions about this with Eddie, Burn, and Tong.

This issue is about threads for doing the work of an aggregate.
  Threads for doing the work of a primitive are specified using <scaleout 
numberOfInstances="nnn"/>

The work of an aggregate, done on a thread, is:
  1) deserializing (not needed for communication within co-located components)
  2) running the Flow Controller
  3) serializing a CAS (not always needed)
        either back to a caller (if not co-located) or out to a remote

This work applies both to top-level aggregates, as well as contained, 
co-located aggregates.

The threads for doing the work are associated with the queues involved.  Each 
queue could have a different number of threads.

There are 3 queues for an aggregate (top level, or inner, co-located), some of 
which may not be present in any given deployment.  The three are:

  1) the input queue for this aggregate.  Note: inner aggregates have their own 
input queue
  2) local reply q for co-located delegates
  3) remote reply q for remote delegates
     
UIMA-1130 allowed specifying the scaleout for queue # 3.

This Jira is to allow specifying the scaleouts for queue # 1 and 2.

This specification is needed at multiple levels of aggregation (for those 
analysisEngines having async="true"; analysisEngines with async="false" are 
treated by UIMA-AS as "primitives")

There are two specifications needed in general:  
  1) one for the internal reply queue (in addition to the existing remote reply 
queues), and
  2) one for the input queue.

I think it would be less confusing if we avoided overloading the same element 
name (i.e., replyQueue) with different meanings, depending on context.

I would propose the following:

Each <analysisEngine async="true"> element would have a spec for these two 
scaleout numbers.  This could be done as attributes on the <analysisEngine... > 
spec itself, or as one or two nested elements.  Here's 3 proposals:

attributes on <analysisEngine> itself:
     <analysisEngine async="true" internalReplyQueueScaleout="nn1"  
inputQueueScaleout="nn2">

one nested element:
     <analysisEngine async="true">
          <aggregateWorkScaleout  internalReplyQueue='nn1"  inputQueue="nn2"/>

two nested elements:
     <analysisEngine async="true">
          <internalReplyQueue scaleout="nn1"/>
          <inputQueue scaleout="nn2"/>

I prefer the first alternative, but not strongly.  If we did this, I would also 
propose changing what we did for UIMA-1130 to follow this same syntax, adding a 
remoteReplyQueueScaleout="nn1" to the <remoteAnalysisEngine> element.  We still 
have time to change that, I think, if we want to.

Other opinions?

> Setting the number of concurrent listeners of a reply queue for Co-located 
> Delegates
> ------------------------------------------------------------------------------------
>
>                 Key: UIMA-1146
>                 URL: https://issues.apache.org/jira/browse/UIMA-1146
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Async Scaleout
>    Affects Versions: 2.2.2AS
>            Reporter: Tong Fin
>            Assignee: Tong Fin
>
> JIRA-1130 has improved UIMA-AS to allow users to set the number of concurrent 
> listeners of a reply queue for  each "remote" delegate. The following is the 
> syntax in the xml deployment descriptor (as an example):
>       <analysisEngine async="true">
>         <delegates>
>           <remoteAnalysisEngine key="RoomNumber">
>             <inputQueue brokerURL="tcp://localhost:61616" 
> endpoint="RoomNumberAnnotatorQueue"/>
>             <replyQueue concurrentConsumers="2" location="remote"/>
>             ...
>           </remoteAnalysisEngine>
>         </delegates>
>         ...
>       </analysisEngine>
> This JIRA will do the similar thing by allowing users to set the number of 
> concurrent listeners of a reply queue for  "co-located" delegates inside the 
> UIMA-AS aggregate. 
> The following is the "proposed" syntax:
>       <analysisEngine async="true"> <!-- Top aggregate -->
>         <replyQueue concurrentConsumers="2">
>         ...
>         <delegates>
>           <analysisEngine key="NamesAndPersonTitlesTAE" async="true"> <!-- 
> co-located aggregate -->
>             <replyQueue concurrentConsumers="3">
>             ...
>           </analysisEngine>
>           ...
>         </delegates>
>         ...
>       </analysisEngine>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to