[ https://issues.apache.org/jira/browse/UIMA-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654586#action_12654586 ]
Marshall Schor commented on UIMA-1245: -------------------------------------- I wonder if it is a good goal to try and have the multi-threaded UIMA-AS aggregate work the same as a single-threaded UIMA aggregate. Consider the following example: An aggregate, AGGR ~inner~ , containing a delegate Cas Multiplier (CM) whose children go thru some subsequent Analysis Engines (AE ~children~ ) and whose main parent CAS subsequently goes thru some subsequent AEs (AE ~parent~) before exiting the aggregate. Now imagine this aggregate, AGGR ~inner~ , contained in another aggregate, AGGR ~outer~ , where AGGR ~inner~ is considered to be a Cas Multiplier (that is, the "children" CASes produced by CM exit that aggregate and are input back into AGGR ~outer~). Now image that AGGR ~inner~ is "remote", working off of a JMS queue. How would the proposed solution determine when "all its children CASes have finished processing"? This would require some new signaling from AGGR ~outer~ back to AGGR ~inner~ , which we don't currently have. This is because the processing of the child CASes could continue in AGGR ~outer~, and AGGR ~outer~ would need to signal (in the general case through many levels of nesting) when a child CAS was finished processing. If the idea was to suspend the flow of the parent CAS until all of its children CASes had left AGGR ~inner~ , and then return the parent CAS to AGGR ~outer~, this wouldn't require new signaling, but it would potentially change the order of processing from the comparable single-threaded plain UIMA case - where the parent would be held until the child CASes had finished their processing in all containing aggregate levels. Perhaps the thinking in this case is to # *block* the parent from going thru any AE ~parent~ in the AGGR ~inner~ until all of the child CASes have exited AGGR ~inner~ (or gone to "final state"), # *release* the parent, allowing it to go through all of its AEs ~parent~ in the AGGR ~inner~, and then be returned to AGGR ~outer~ # *block* the parent upon return to AGGR ~outer~ until all of its children have finished processing in AGGR ~outer~ . with suitable extensions for multi-levels of nesting :-) This would not be _identical_ to the single-threaded case, but it might be close enough. But my feeling is this is getting very complex, and some simpler (to explain) approach that gives up on the goal of having the UIMA-AS and single-threaded UIMA cases operate the same might be better. One thing complicating the current design approaches is the overloading of these kinds of flow decisions with the way errors are "passed back" - the current design for errors is using the Parent CAS to signal the failure of a Child Cas in some cases, so the parent CAS needs to be kept around until all the child CASes have exited a level. > Processing order of parent CAS different on UIMA and UIMA AS > ------------------------------------------------------------ > > Key: UIMA-1245 > URL: https://issues.apache.org/jira/browse/UIMA-1245 > Project: UIMA > Issue Type: Bug > Components: Async Scaleout > Reporter: Eddie Epstein > > Arron Kaplan raised the question of when parent CASes are processed relative > to their children. See http://markmail.org/message/5cop7iv2nshouhgs As of > now, the processing order for a multi-threaded UIMA AS aggregate is different > than that for a single-threaded UIMA aggregate. > A discussion with Burn, Adam, Jerry, Marshall and myself concluded that the > default processing order for UIMA AS should be changed to be the same as in > UIMA, in order to have the same application behavior for both. This will be > done by suspending flow of a parent CAS after it is returned from a > CasMultiplier delegate until all its children CASes have finished processing. > However, there also needs to be a UIMA AS deployment option for CasMultiplier > delegates that allows the parent CAS to resume processing immediately after > being returned from the CM. This option is needed to enable parallel > processing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.