On 08/16/2011 01:45 PM, Eddie Epstein wrote:
Thanks again for your reply. I thought that I was deploying the pipeline in
one AS process with the first option for running it:

runRemoteAsyncAE.sh tcp://localhost:61616 CollectionReader \
  -c sbmi-ctsa/desc/asynchronous_scaleout/SentencesFromDBReader.xml \
  -d sbmi-ctsa/desc/asynchronous_scaleout/Deploy_DictionaryTest.xml

It looks like one process in the output of ps. I'm just surprised that the
performance is so much slower (16x slower).

Right, all in one process, but the connection between client and
service is the same used between multiple processes. As a quick test,
create a new aggregate with these two delegates:
SentencesFromDBReader.xml and
SbmiUmlsSmallAggregatePlaintextProcessor.xml. Then create a deployment
descriptor for this aggregate, say
Deploy_OneProcessDictionaryTest.xml, and test it with:

runRemoteAsyncAE.sh tcp://localhost:61616 OneProcessQueue \
-d sbmi-ctsa/desc/asynchronous_scaleout/Deploy_OneProcessDictionaryTest.xml

Without a collection reader runRemoteAsyncAE will send a single empty
CAS to the service. This will kick off the embedded collection reader
in the aggregate, and hopefully you'll see times similar to the CPE.

I get it now. When I put the CR into the aggregate AE and ran 'runRemoteAsyncAE.sh' without the '-c' flag, it was within a few seconds of being as fast as the CPE.

Thanks for the pointers. One take-away for me seems to be that UIMA AS might not be a means to scale for performance if you have to run service instances remotely. What I've been doing is to run a bunch of CPEs in parallel, using the modulo operator in the SQL of the CR to ensure that each CR is pulling data from its own partition of the collection, e.g.

  SELECT TEXT
  FROM DOCUMENTS
  WHERE ID % 25 = x

where each of the 25 instances will have a different number from [0, 1, 2, 3 …] for its 'x'.

Thanks again to all who responded. I've learned a lot.

Chuck

To create a pipeline with an architecture like Figure 5, I would use the
example in "4.6. Asynchronous Client API Usage Scenarios" on p. 30 of the
uima_async_scaleout.pdf for 2.3.1?

That would be one way. The important points are 1) to send a CAS which
points at some subset of the collection, and 2) change the embedded
collection reader inside the service to a CasMultiplier which can
access that CAS and generate the sub-collection of CASes to the
pipeline. Given these 2, a static set of CASes to be sent to the
service could be created and runRemoteAsyncAE used to send them.

Eddie

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to