On 08/16/2011 01:45 PM, Eddie Epstein wrote:
Thanks again for your reply. I thought that I was deploying the pipeline in one AS process with the first option for running it:runRemoteAsyncAE.sh tcp://localhost:61616 CollectionReader \ -c sbmi-ctsa/desc/asynchronous_scaleout/SentencesFromDBReader.xml \ -d sbmi-ctsa/desc/asynchronous_scaleout/Deploy_DictionaryTest.xml It looks like one process in the output of ps. I'm just surprised that the performance is so much slower (16x slower).Right, all in one process, but the connection between client and service is the same used between multiple processes. As a quick test, create a new aggregate with these two delegates: SentencesFromDBReader.xml and SbmiUmlsSmallAggregatePlaintextProcessor.xml. Then create a deployment descriptor for this aggregate, say Deploy_OneProcessDictionaryTest.xml, and test it with: runRemoteAsyncAE.sh tcp://localhost:61616 OneProcessQueue \ -d sbmi-ctsa/desc/asynchronous_scaleout/Deploy_OneProcessDictionaryTest.xml Without a collection reader runRemoteAsyncAE will send a single empty CAS to the service. This will kick off the embedded collection reader in the aggregate, and hopefully you'll see times similar to the CPE.
I get it now. When I put the CR into the aggregate AE and ran 'runRemoteAsyncAE.sh' without the '-c' flag, it was within a few seconds of being as fast as the CPE.
Thanks for the pointers. One take-away for me seems to be that UIMA AS might not be a means to scale for performance if you have to run service instances remotely. What I've been doing is to run a bunch of CPEs in parallel, using the modulo operator in the SQL of the CR to ensure that each CR is pulling data from its own partition of the collection, e.g.
SELECT TEXT FROM DOCUMENTS WHERE ID % 25 = xwhere each of the 25 instances will have a different number from [0, 1, 2, 3 …] for its 'x'.
Thanks again to all who responded. I've learned a lot. Chuck
To create a pipeline with an architecture like Figure 5, I would use the example in "4.6. Asynchronous Client API Usage Scenarios" on p. 30 of the uima_async_scaleout.pdf for 2.3.1?That would be one way. The important points are 1) to send a CAS which points at some subset of the collection, and 2) change the embedded collection reader inside the service to a CasMultiplier which can access that CAS and generate the sub-collection of CASes to the pipeline. Given these 2, a static set of CASes to be sent to the service could be created and runRemoteAsyncAE used to send them. Eddie
smime.p7s
Description: S/MIME Cryptographic Signature