Hi Michael,

On 8/22/07, Michael Baessler <[EMAIL PROTECTED]> wrote:
> 1) Do you have any experiences with the memory footprint when using UIMA
> AS? It seems to me that when deploying a larger system with multiple AEs
> a lot of queues are used. Each AS aggregate use a queue for the
> delegates. How is the performance with all these queues? Do you have any
> measurements?

This is a great question. When an aggregate is deployed with
asynchronous delegates, each process call goes through a queue. For
colocated delegates the message is just a reference to the in-memory
CAS, so there is no serialization overhead. Moreover, a colocated
broker in the same JVM is used for communication between colocated
components, and ActiveMQ has optimized the producer/consumer paths to
a colocated broker. But even with these optimizations the overhead is
undesirable for some configurations.

I will get some specific overhead times for calling colocated
delegates in the next couple of days. For remote delegates, the
overhead is basically determined by XmiCas serialization  steps and is
a function of CAS content.

Note that by default an aggregate is deployed as an AS primitive, that
is, as a single threaded component, so there is no performance
degradation for processing within the aggregate. An aggregate is only
deployed asynchronously if required, i.e. one of the delegates is a
remote service, a colocated delegate is to be replicated, special
error handling is desired for a delegate, or it is simply desired to
run delegates in separate threads for concurrency.

> 2) Collection Process complete - The documentation says: "If a component
> is replicated, only one of the instances will receive the
> collectionProcessComplete call". I think replicated mean there is more
> than von instance of the same component. So why does only one of the
> components receive that call? Is that given by design that only one, and
> we don't know which of the component receive the information? I think is
> is the same as with a CAS, right?

That's right. If it is required to have all CASes go through the same
instance of an analytic then it should not be replicated.

> 3) When the system processes a document without any CasMultiplier the
> process call for this document blocks until the result is created and
> returned?  So in the system only one CAS is created and used.

Not sure what you mean by "the system" here. An AS primitive will
process only one CAS at a time; an AS aggregate can process more than
one CAS at a time, based on the number of delegates and the size of
the caspool specified at the top level of the service.

> If the system has also CasMultiplier components the CasPool size for a
> CasMultiplier component can limit the CASes that can be used/created at
> the same time. But how does this work if the system collects the
> documents itself? The the call blocked as long as all the documents are
> processed?

If an AS aggregate has a CasMultiplier, additional CASes can be put
into play concurrently, limited by the size of the CasMultiplier's
caspool. The design relies on the proper choice of caspool sizes to
enable the desired level of concurrent processing. The caspools also
limit the number of requests that can build up in any input queue,
avoiding queue overflows that are otherwise possible in asynchronous
messaging systems.

> 4) The error handling seems to be similar as in the CPM with some
> additional new features (real retry). Is there some reuse of the old code?

No code reuse, only reuse of error handling concepts.

> 5) UIMA AS does not have a StatusListener that can/must be implemented
> to get some information about the system. How are the results reposted
> in the good case? I understand that in an error case, the error with
> some additional background information is returned.

The custom flow controller is the key to application customization.
User code there can register the state of all CASes processed, or
route CASes to specific annotators designed to register such things.

Eddie

Reply via email to