[ 
https://issues.apache.org/jira/browse/UIMA-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654586#action_12654586
 ] 

Marshall Schor commented on UIMA-1245:
--------------------------------------

I wonder if it is a good goal to try and have the multi-threaded UIMA-AS 
aggregate work the same as a single-threaded UIMA aggregate.  Consider the 
following example: 

An aggregate, AGGR ~inner~ , containing a delegate Cas Multiplier (CM) whose 
children go thru some subsequent Analysis Engines (AE ~children~ ) and whose 
main parent CAS subsequently goes thru some subsequent AEs (AE ~parent~) before 
exiting the aggregate.

Now imagine this aggregate, AGGR ~inner~ , contained in another aggregate, AGGR 
~outer~ , where AGGR ~inner~ is considered to be a Cas Multiplier (that is, the 
"children" CASes produced by CM exit that aggregate and are input back into 
AGGR ~outer~).

Now image that AGGR ~inner~ is "remote", working off of a JMS queue.

How would the proposed solution determine when "all its children CASes have 
finished processing"?  This would require some new signaling from  AGGR ~outer~ 
back to AGGR ~inner~ , which we don't currently have.  This is because the 
processing of the child CASes could continue in AGGR ~outer~, and AGGR ~outer~ 
would need to signal (in the general case through many levels of nesting) when 
a child CAS was finished processing. 

If the idea was to suspend the flow of the parent CAS until all of its children 
CASes had left AGGR ~inner~ , and then return the parent CAS to AGGR ~outer~, 
this wouldn't require new signaling, but it would potentially change the order 
of processing from the comparable single-threaded plain UIMA case - where the 
parent would be held until the child CASes had finished their processing in all 
containing aggregate levels.   

Perhaps the thinking in this case is to 
# *block* the parent from going thru any AE ~parent~ in the AGGR ~inner~ until 
all of the child CASes have exited AGGR ~inner~ (or gone to "final state"),
# *release* the parent, allowing it to go through all of its AEs ~parent~ in 
the AGGR ~inner~, and then be returned to AGGR ~outer~
# *block* the parent upon return to AGGR ~outer~ until all of its children have 
finished processing in AGGR ~outer~ .

with suitable extensions for multi-levels of nesting :-)

This would not be _identical_ to the single-threaded case, but it might be 
close enough.  But my feeling is this is getting very complex, and some simpler 
(to explain) approach that gives up on the goal of having the UIMA-AS and 
single-threaded UIMA cases operate the same might be better.  

One thing complicating the current design approaches is the overloading of 
these kinds of flow decisions with the way errors are "passed back" - the 
current design for errors is using the Parent CAS to signal the failure of a 
Child Cas in some cases, so the parent CAS needs to be kept around until all 
the child CASes have exited a level.  

> Processing order of parent CAS different on UIMA and UIMA AS
> ------------------------------------------------------------
>
>                 Key: UIMA-1245
>                 URL: https://issues.apache.org/jira/browse/UIMA-1245
>             Project: UIMA
>          Issue Type: Bug
>          Components: Async Scaleout
>            Reporter: Eddie Epstein
>
> Arron Kaplan raised the question of when parent CASes are processed relative 
> to their children. See http://markmail.org/message/5cop7iv2nshouhgs  As of 
> now, the processing order for a multi-threaded UIMA AS aggregate is different 
> than that for a single-threaded UIMA aggregate.
> A discussion with Burn, Adam, Jerry, Marshall and myself concluded that the 
> default processing order for UIMA AS should be changed to be the same as in 
> UIMA, in order to have the same application behavior for both. This will be 
> done by suspending flow of a parent CAS after it is returned from a 
> CasMultiplier delegate until all its children CASes have finished processing.
> However, there also needs to be a UIMA AS deployment option for CasMultiplier 
> delegates that allows the parent CAS to resume processing immediately after 
> being returned from the CM. This option is needed to enable parallel 
> processing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to