We have noted a number of issues in 2.18.3 in using parallel 
multicast/recipientlist with a timeout under heavy load or in pathological 
situations (decreased throughput in one or more tasks).

1. If any of the tasks cannot be submitted, typically due to a 
RejectedExecutionException, the AggregateOnTheFlyTask will never terminate, but 
will call the timeout method continually. Excessive calls to timeout could also 
happen if the thread pool has a CallerRuns policy, but I haven't attempted to 
produce that. I am pretty sure the issue was introduced in 2.15.x by the 
changes in CAMEL-8081.

2. The timeout does not start until the first submitted task begins running. 
This can be quite a substantial delay if threads are being queued in a thread 
pool. The timeout StopWatch really needs to come from the beginning. It would 
really help drain the queue if the submitted Callable would fail-fast if the 
timeout has already run out.

3. Related to (2), the main thread waits forever for the aggregate task to 
complete. If a timeout is given, it should be honored or at least used to 
provide a reasonable escape.

4. This is more of a comment, but I  would be very wary of the 
parallelAggregate option. There are a lot of potential races there, especially 
after timeout.  It seems like that could spin its wheels a bit while the 
parallel aggregate completes, call too many timeouts, and/or exit before the 
aggregation strategy actually completes. 

I can reproduce (1) in a test case, so I can do a JIRA on it. I might be able 
to come up with a similar test for 2/3. I am thinking that a JIRA for that, 
though related, should probably be separate. 



Reply via email to